Skip to Tutorial Content

Tutorial Introduction

Welcome!

Welcome to your first tutorial for coding in R.

Click the “continue” button below.

Reading

Before we get into R, let’s talk about the importance of reading what’s in here as you go.

Why Read?

It’s tempting to click through and skip the written portions. But you’ll get the most out of these tutorials if you actually read what’s here. Before completing the labs, you should move gorilla carefully through the tutorials to see what all is here. It will help you a ton when you get to the lab!

Did you see the gorilla in that last passage?

Did you “Pass” the Reading Test?

If you saw the word “gorilla” in that long passage of text, then you passed! You’re probably reading carefully enough.

If you didn’t see “gorilla,” you’re probably skimming the text too much. Slow down!

Navigating the Tutorials

You’ll notice there are tabs on the left-hand side. This tutorial is divided into five sections, and you can navigate at any time to each of those sections.

The last tab, “Return home,” is a quick way to return back to the tutorial home page. Or just go to the url bar and backspace to get to the root url: https://stat212-learnr.stat.illinois.edu

Completed Code Chunks

Throughout this tutorial, you will see code chunks like this.

This code chunk is ready to go. Just click run code on the right and see what it produces!

Note: A recent bug has made the “run code” buttons appear disabled, but they are, in fact, clickable!

2+2

Feel free to adjust

Note that you can also change the contents of the code chunks too. Go back to the one above and try typing a different mathematical expression to see what happens!

Writing Code in Code Chunks

Sometimes though, the code chunk will be empty or incomplete, and it will be up to you to write some code. If you’re stuck, you may click for a hint, and finally, an answer!

For this code chunk, type a mathematical expression that computes to 8.

3+5
#Here is an option: try replacing the blank with 5
3+_

Three Quick Notes

  1. If your page freezes or goes grey, try clicking refresh!

  2. Sometimes, code chunks may take a long time to run (or just give up trying to compute). Sometimes this is fixed by refreshing the page, but sometimes this is just because there is a lot of traffic on the site at once (like the night before a lab is due). In these instances, you should be able to learn everything you need by simply reading the text and carefully referencing the code–even without running it.

  3. Notice the start over button just below the tabs on the left-hand side. Click this if you would like the tutorial to clean all of the tabs and start fresh again.

Now let’s proceed to more challenging adventures!

What is R?

A summary

R is a coding language. Think of coding as giving instructions to a machine to perform tasks.

There are a lot of coding languages out there, but R is especially well suited for the kinds of tasks we do in statistics and data science. You can use R to…

  • Complete computational tasks much like you would use a calculator
  • Create data visualizations
  • Complete statistical tests, models, and other algorithms
  • Create dynamic applications and webpages (this tutorial is written in part with R!)
  • Use special programs (via packages) that others have coded and shared with the R community

In these tutorials, we’ll focus on some basic, but very practical, coding tasks using the R language!

How do you use R?

Web-Based Account: If you only plan to use R for this class, and don’t anticipate coding much in the future, you may wish to use the web-based platform Posit Cloud to to create RStudio projects. This will also be necessary for anyone using a chromebook or tablet, as those operating systems aren’t compatible with R software installation.

Computer Installation: Folks who expect to use R more regularly will often install R and RStudio on a computer and code from there. You can learn more about that by watching this video now, or waiting until after you finish the tutorials.

Next

Now, let’s dive into coding!

R as your Calculator

Arithmetic

As you might guess by now, R is kind of like a powerful calculator on your computer that can compute quite a few things. Most obvious of which are basic arithmetic functions.

look at the examples provided for simple computations:

#Addition
56+1

#Subtraction
66-60

#Multiplication
45*2

#Division
65/5

#Exponents (the carat symbol, which for most keyboards will be shift+6)
3^2

Built-in Functions

The previous code chunk contained several of R‘s built-in “symbolic functions.” ’R’ also has many many built-in written functions as well.

For example, sqrt is a built-in written function that takes the square root of whatever value is inserted

sqrt(25)

sqrt(11)

Adding Parentheses

We can also use parentheses to complete multiple calculations at once.

(25-5)/4

Give it a try!

Output the code 8 plus 6, all divided by 2. Use the hint button for help, or to see the solution.

(8 + 6)/2
#Did you use parentheses around 8+6?
(8+6)

Vectors in R

What is a Vector in R?

A vector is a collection of items (for example, numbers) that are tied together into one “object.” To create a vector, we will use the built-in R function, c (which is short for “combine”)

The following vector could represent the weights (in pounds) of 13 adults. The output of this function is these 13 values, but combined together into one object.

c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)

An Object?

R is known as an object-oriented programming language.

That just means you can create, name, and save objects into your “global environment.” Saved objects will show up in your Global Environment pane on the top right.

I’ll talk more about that in an upcoming video!

Saving a Vector

One way to see how the c function works is to assign the vector to a name. Then when we run the assigned name, we get our object of numbers outputted.

weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)

weights

Operations on a variable

This is where object-oriented programming gets fun.R allows for “vectorized operations.” That means that if you complete a mathematical operation on a vector, it will apply that operation to every entry in that vector.

Take a look at the following example, where we take our weights vector and divide it by 2.205 to convert these values from pounds to kilograms

Try changing 2.205 to a different number to observe what happens!

weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)

weights

weights_kilo = weights/2.205

weights_kilo

A Character Vector

A vector doesn’t have to be a set of numbers–it can also be a set of characters.

An example of a character vector might be storing responses to a question that produces categorical responses, like “yes” and “no.” Notices that character entries should be in quotation marks (whereas numbers should be listed without quotation marks since they have an inherent value that R recognizes).

c("yes","yes","no","yes","no","yes","yes","yes","no","yes")

Sequences

Sometimes (like when defining the axis of a plot), we might wish to create a sequence of equally placed numbers. There is a special function named “seq” that allows us to make a sequence from a starting value to a final value, by intervals of our choice.

Notice that this function has three arguments to fill.

  • from: the first number of your sequence
  • to: the last number of your sequence
  • by: the interval lengths

The following creates a sequence from 2 to 20 by 2’s

seq(from = 2, to = 20, by = 2)

Leaving out the argument names

Keep in mind that in R, we don’t have to fill in all of the argument names. If we list three input values, R assumes the order is from, to, by.

seq(2, 20, 2)

Default entries

Something else to keep in mind–we don’t have to fill in every possible argument to a function. Only the necessary ones. For example, if we leave the “by” argument empty, R will assume a default value of 1. Try running this to see!

seq(2, 20)

Function Documentation

R keeps documentation for all of its built-in functions. By running a ? in front of the function, your web browser will open up the documentation page online.

Notice that the documentation will tell you 1) all possible arguments you could define inside that function, 2) default entries if left undefined, and a few other things.

?seq

Summarizing Data

Introducing Data Frames

A data frame in R is a collection of vectors, where each vector represents one variable of data. Typically, each column of a data frame is a variable, and each row represents one unit of observation.

Creating vs. Uploading

In the “Navigating RStudio” video earlier, I demo creating a data frame in code from scratch. However, you won’t ever need to create data frames that way in this class.

Instead, you’ll either be using a data frame stored in a package, or you’ll be importing a data file (like an Excel Spreadsheet) directly into your RStudio session.

Data from a Package

The prostate data frame includes health statistics from a sample of 97 people.

This data is stored in a package called faraway.

A Package?

A lot of datasets and special functions might be saved in packages that other people create and share with the community.

If you have RStudio open, perhaps you’re ready to install your first package! But feel free to come back and watch this later.

Library

After installing a package, you can use The library function to activate that package. Keep in mind:

  • Packages only need to be installed once (like installing an app on your phone!)
  • Packages need to be libraried every time you start a new session of R! (like opening an app on your phone after closing it)

Below, I am activating the faraway package into my session of RStudio with library, and then viewing the prostate data frame.

library(faraway)
prostate

summary function

The summary function is a quick way to produce several helpful summary statistics for all of our variables at once. It will produce the 5-number summary, as well as the mean for all numeric variables.

summary(prostate)

Call on a singular variable

If you want to reference a specific variable inside of a data frame, use the $ operator.

For the following code, read it as: “Within prostate, pull up the variable age.”

prostate$age

summary of one variable

Using the $ operator, we could also choose to run a numeric summary for one specific variable.

summary(prostate$age)

Other statistics

You can also use functions like mean, sd, and median directly.

sd(prostate$lweight)
mean(prostate$lweight)
median(prostate$age)

Visualize it with Histograms

We’ll learn a lot more about visualizing data later, but you can use the base R function hist to visualize a numeric variable!

hist(prostate$lweight)

Number of bins

To control the number of bins using the hist function, you can add a breaks argument. For example, if I wanted 30 bin breaks, I can do the following:

hist(prostate$lweight, breaks = 30)

Saving or copying images

Note that you can save (or screenshot) images from RStudio into a document. If you need a little help, here is a quick demo!

Exploring the diabetes data frame

Now Lets take a look at a new dataset

library the faraway package again, and then call up the data frame named diabetes to display.

library(_______)
________
library(faraway)
diabetes

Summary

Now, run a summary of all variables in the diabetes data frame.

summary(____)
summary(diabetes)

Visualize it

Create a histogram to visualize the age variable within diabetes.

hist(diabetes$___)
hist(diabetes$age)

One more

Lastly, calculate the standard deviation of the age variable within diabetes.

sd(diabetes$____)
sd(diabetes$age)

Return Home

Great job! If you’re in STAT 212, don’t forget to move on to the other tutorial listed under the Lab 1 heading.

This tutorial was created by Kelly Findley with help from Brandon Pazmino (UIUC ’21). We hope this experience was helpful for you!

If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/

Introduction to R