Tutorial Introduction
Welcome!
Welcome to your first tutorial for coding in R
.
Click the “continue” button below.
Reading
Before we get into R
, let’s talk about the importance of reading what’s in here as you go.
Why Read?
It’s tempting to click through and skip the written portions. But you’ll get the most out of these tutorials if you actually read what’s here. Before completing the labs, you should move gorilla carefully through the tutorials to see what all is here. It will help you a ton when you get to the lab!
…
Did you see the gorilla in that last passage?
Did you “Pass” the Reading Test?
If you saw the word “gorilla” in that long passage of text, then you passed! You’re probably reading carefully enough.
If you didn’t see “gorilla,” you’re probably skimming the text too much. Slow down!
Completed Code Chunks
Throughout this tutorial, you will see code chunks like this.
This code chunk is ready to go. Just click run code on the right and see what it produces!
Note: A recent bug has made the “run code” buttons appear disabled, but they are, in fact, clickable!
2+2
Feel free to adjust
Note that you can also change the contents of the code chunks too. Go back to the one above and try typing a different mathematical expression to see what happens!
Writing Code in Code Chunks
Sometimes though, the code chunk will be empty or incomplete, and it will be up to you to write some code. If you’re stuck, you may click for a hint, and finally, an answer!
For this code chunk, type a mathematical expression that computes to 8.
3+5
#Here is an option: try replacing the blank with 5
3+_
Three Quick Notes
If your page freezes or goes grey, try clicking refresh!
Sometimes, code chunks may take a long time to run (or just give up trying to compute). Sometimes this is fixed by refreshing the page, but sometimes this is just because there is a lot of traffic on the site at once (like the night before a lab is due). In these instances, you should be able to learn everything you need by simply reading the text and carefully referencing the code–even without running it.
Notice the start over button just below the tabs on the left-hand side. Click this if you would like the tutorial to clean all of the tabs and start fresh again.
Now let’s proceed to more challenging adventures!
What is R?
A summary
R
is a coding language. Think of coding as giving instructions to a machine to perform tasks.
There are a lot of coding languages out there, but R
is especially well suited for the kinds of tasks we do in statistics and data science. You can use R to…
- Complete computational tasks much like you would use a calculator
- Create data visualizations
- Complete statistical tests, models, and other algorithms
- Create dynamic applications and webpages (this tutorial is written in part with
R
!) - Use special programs (via packages) that others have coded and shared with the
R
community
In these tutorials, we’ll focus on some basic, but very practical, coding tasks using the R
language!
How do you use R?
Computer Installation: Most people use R by installing R and RStudio on a computer and coding from there. You can learn more about that by watching this video now, or waiting until after you finish the tutorials.
Web-Based Account: You can also use R through the web-based platform RStudio Cloud (or you may now see it called “Posit Cloud”). This will also be necessary for anyone using a chromebook or tablet, as those operating systems aren’t compatible with R
software.
Next
Now, let’s dive into coding!
R as your Calculator
Arithmetic
As you might guess by now, R
is kind of like a powerful calculator on your computer that can compute quite a few things. Most obvious of which are basic arithmetic functions.
look at the examples provided for simple computations:
#Addition
56+1
#Subtraction
66-60
#Multiplication
45*2
#Division
65/5
#Exponents (the carat symbol, which for most keyboards will be shift+6)
3^2
Built-in Functions
The previous code chunk contained several of R
‘s built-in “symbolic functions.” ’R’ also has many many built-in written functions as well.
For example, sqrt
is a built-in written function that takes the square root of whatever value is inserted
sqrt(25)
sqrt(11)
Adding Parentheses
We can also use parentheses to complete multiple calculations at once.
(25-5)/4
Give it a try!
Output the code 8 plus 6, all divided by 2. Use the hint button for help, or to see the solution.
(8 + 6)/2
#Did you use parentheses around 8+6?
(8+6)
Vectors in R
What is a Vector in R?
A vector is a collection of items (for example, numbers) that are tied together into one “object.” To create a vector, we will use the built-in R function, c
(which is short for “combine”)
The following vector could represent the weights (in pounds) of 13 adults. The output of this function is these 13 values, but combined together into one object.
c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
An Object?
R
is known as an object-oriented programming language.
That just means you can create, name, and save objects into your “global environment.” Saved objects will show up in your Global Environment pane on the top right.
I’ll talk more about that in an upcoming video!
Saving a Vector
One way to see how the c
function works is to assign the vector to a name. Then when we run the assigned name, we get our object of numbers outputted.
weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
weights
Operations on a variable
This is where object-oriented programming gets fun.R
allows for “vectorized operations.” That means that if you complete a mathematical operation on a vector, it will apply that operation to every entry in that vector.
Take a look at the following example, where we take our weights
vector and divide it by 2.205
to convert these values from pounds to kilograms
Try changing 2.205 to a different number to observe what happens!
weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
weights
weights_kilo = weights/2.205
weights_kilo
A Character Vector
A vector doesn’t have to be a set of numbers–it can also be a set of characters.
An example of a character vector might be storing responses to a question that produces categorical responses, like “yes” and “no.” Notices that character entries should be in quotation marks (whereas numbers should be listed without quotation marks since they have an inherent value that R
recognizes).
c("yes","yes","no","yes","no","yes","yes","yes","no","yes")
Sequences
Sometimes (like when defining the axis of a plot), we might wish to create a sequence of equally placed numbers. There is a special function named “seq” that allows us to make a sequence from a starting value to a final value, by intervals of our choice.
Notice that this function has three arguments to fill.
from
: the first number of your sequenceto
: the last number of your sequenceby
: the interval lengths
The following creates a sequence from 2 to 20 by 2’s
seq(from = 2, to = 20, by = 2)
Leaving out the argument names
Keep in mind that in R
, we don’t have to fill in all of the argument names. If we list three input values, R
assumes the order is from
, to
, by
.
seq(2, 20, 2)
Default entries
Something else to keep in mind–we don’t have to fill in every possible argument to a function. Only the necessary ones. For example, if we leave the “by” argument empty, R will assume a default value of 1. Try running this to see!
seq(2, 20)
Function Documentation
R
keeps documentation for all of its built-in functions. By running a ?
in front of the function, your web browser will open up the documentation page online.
Notice that the documentation will tell you 1) all possible arguments you could define inside that function, 2) default entries if left undefined, and a few other things.
?seq
Try it in R!
If you already installed R and RStudio (or created an account for RStudio Cloud), perhaps you’re ready to try this all out! But feel free to watch this later if you’d like to focus on finishing the tutorials first.
Summarizing Data
Introducing Data Frames
A data frame in R
is a collection of vectors, where each vector represents one variable of data. Typically, each column of a data frame is a variable, and each row represents one unit of observation.
Creating vs. Uploading
In the “Navigating RStudio” video earlier, I demo creating a data frame in code from scratch. However, you won’t ever need to create data frames that way in this class.
Instead, you’ll either be using a data frame stored in a package, or you’ll be importing a data file (like an Excel Spreadsheet) directly into your RStudio session.
Data from a Package
The prostate
data frame includes health statistics from a sample of 97 people.
This data is stored in a package called faraway
.
A Package?
A lot of datasets and special functions might be saved in packages that other people create and share with the community.
If you have RStudio open, perhaps you’re ready to install your first package! But feel free to come back and watch this later.
Library
After installing a package, you can use The library
function to activate that package. Keep in mind:
- Packages only need to be installed once (like installing an app on your phone!)
- Packages need to be libraried every time you start a new session of
R
! (like opening an app on your phone after closing it)
Below, I am activating the faraway
package into my session of RStudio with library
, and then viewing the prostate
data frame.
library(faraway)
prostate
summary function
The summary
function is a quick way to produce several helpful summary statistics for all of our variables at once. It will produce the 5-number summary, as well as the mean for all numeric variables.
summary(prostate)
Call on a singular variable
If you want to reference a specific variable inside of a data frame, use the $
operator.
For the following code, read it as: “Within prostate
, pull up the variable age
.”
prostate$age
summary of one variable
Using the $
operator, we could also choose to run a numeric summary for one specific variable.
summary(prostate$age)
Other statistics
You can also use functions like mean
, sd
, and median
directly.
sd(prostate$lweight)
mean(prostate$lweight)
median(prostate$age)
Visualize it with Histograms
We’ll learn a lot more about visualizing data later, but you can use the base R function hist
to visualize a numeric variable!
hist(prostate$lweight)
Number of bins
To control the number of bins using the hist
function, you can add a breaks
argument. For example, if I wanted 30 bin breaks, I can do the following:
hist(prostate$lweight, breaks = 30)
Saving or copying images
Note that you can save (or screenshot) images from RStudio into a document. If you need a little help, here is a quick demo!
Exploring the diabetes data frame
Now Lets take a look at a new dataset
library the faraway
package again, and then call up the data frame named diabetes
to display.
library(_______)
________
library(faraway)
diabetes
Summary
Now, run a summary of all variables in the diabetes
data frame.
summary(____)
summary(diabetes)
Visualize it
Create a histogram to visualize the age
variable within diabetes
.
hist(diabetes$___)
hist(diabetes$age)
One more
Lastly, calculate the standard deviation of the age
variable within diabetes
.
sd(diabetes$____)
sd(diabetes$age)
Return Home
Great job! If you’re in STAT 212, don’t forget to move on to the Sampling and Simulation
tutorial next before completing Lab 1.
This tutorial was created by Kelly Findley with help from Brandon Pazmino (UIUC ’21). We hope this experience was helpful for you!
If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/