Tutorial Introduction
Welcome!
Welcome to your first tutorial for coding in R
.
Click the “continue” button below.
Completed Code Chunks
Throughout this tutorial, you will see code chunks. Some code chunks have a run button (like the one below) which means you can run and see the result, or even change the code and run it!
Since these interactive code chunks take a lot of computing power, I also include a lot of code chunks that are already completed and can’t be run. These are just for you to read and learn from.
This code chunk is ready to go. Just click run code on the right and see what it produces!
Note: A recent bug has made the “run code” buttons appear disabled, but they are, in fact, clickable!
2+2
Feel free to adjust
Note that you can also change the contents of the code chunks too. Go back to the one above and try typing a different mathematical expression to see what happens!
Writing Code in Code Chunks
Sometimes though, the code chunk will be empty or incomplete, and it will be up to you to write some code. If you’re stuck, you may click for a hint, and finally, an answer!
For this code chunk, type a mathematical expression that computes to 8.
3+5
#Here is an option: try replacing the blank with 5
3+_
Three Quick Notes
If your page freezes or goes grey, try clicking refresh!
Sometimes, code chunks may take a long time to run (or just give up trying to compute). Sometimes this is fixed by refreshing the page, but sometimes this is just because there is a lot of traffic on the site at once (like the night before a lab is due). In these instances, you should be able to learn everything you need by simply reading the text and carefully referencing the code–even without running it.
Notice the start over button just below the tabs on the left-hand side. Click this if you would like the tutorial to clean all of the tabs and start fresh again.
Now let’s proceed to more challenging adventures!
What is R?
A summary
R
is a coding language. Think of coding as giving instructions to a machine to perform tasks.
There are a lot of coding languages out there, but R
is especially well suited for the kinds of tasks we do in statistics and data science. You can use R to…
- Complete computational tasks much like you would use a calculator
- Create data visualizations
- Complete statistical tests, models, and other algorithms
- Create dynamic applications and webpages (this tutorial is written in part with
R
!) - Use special programs (via packages) that others have coded and shared with the
R
community
In these tutorials, we’ll focus on some basic, but very practical, coding tasks using the R
language!
Arithmetic
R
is kind of like a powerful calculator on your computer that can compute quite a few things. Most obvious of which are basic arithmetic functions.
look at the examples provided for simple computations:
#Addition
56+1
## [1] 57
#Subtraction
66-60
## [1] 6
#Multiplication
45*2
## [1] 90
#Division
65/5
## [1] 13
#Exponents (the carat symbol, which for most keyboards will be shift+6)
3^2
## [1] 9
Built-in Functions
The previous code chunk contained several of R
‘s built-in “symbolic functions.” ’R’ also has many many built-in written functions as well.
For example, sqrt
is a built-in written function that takes the square root of whatever value is inserted
sqrt(25)
## [1] 5
sqrt(11)
## [1] 3.316625
Adding Parentheses
We can also use parentheses to complete multiple calculations at once.
(25-5)/4
## [1] 5
Give it a try!
Output the code 8 plus 6, all divided by 2. Use the hint button for help, or to see the solution.
(8 + 6)/2
#Did you use parentheses around 8+6?
(8+6)
Vectors in R
What is a Vector in R?
A vector is a collection of items (for example, numbers) that are tied together into one “object.” To create a vector, we will use the built-in R function, c
(which is short for “combine”)
The following vector could represent the weights (in pounds) of 13 adults. The output of this function is these 13 values, but combined together into one object.
c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
## [1] 187 142 119 225 185 131 102 116 165 96 135 180 157
An Object?
R
is known as an object-oriented programming language.
That just means you can create, name, and save objects into your “global environment.” Saved objects will show up in your Global Environment pane on the top right.
I’ll talk more about that in an upcoming video!
Saving a Vector
One way to see how the c
function works is to assign the vector to a name. Then when we run the assigned name, we get our object of numbers outputted.
weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
weights
## [1] 187 142 119 225 185 131 102 116 165 96 135 180 157
Operations on a variable
This is where object-oriented programming gets fun.R
allows for “vectorized operations.” That means that if you complete a mathematical operation on a vector, it will apply that operation to every entry in that vector.
Take a look at the following example, where we take our weights
vector and divide it by 2.205
to convert these values from pounds to kilograms
Try changing 2.205 to a different number to observe what happens!
weights = c(187, 142, 119, 225, 185, 131, 102, 116, 165, 96, 135, 180, 157)
weights
weights_kilo = weights/2.205
weights_kilo
A Character Vector
A vector doesn’t have to be a set of numbers–it can also be a set of characters.
An example of a character vector might be storing responses to a question that produces categorical responses, like “yes” and “no.” Notices that character entries should be in quotation marks (whereas numbers should be listed without quotation marks since they have an inherent value that R
recognizes).
c("yes","yes","no","yes","no","yes","yes","yes","no","yes")
## [1] "yes" "yes" "no" "yes" "no" "yes" "yes" "yes" "no" "yes"
Sequences
Sometimes (like when defining the axis of a plot), we might wish to create a sequence of equally placed numbers. There is a special function named “seq” that allows us to make a sequence from a starting value to a final value, by intervals of our choice.
Notice that this function has three arguments to fill.
from
: the first number of your sequenceto
: the last number of your sequenceby
: the interval lengths
The following creates a sequence from 2 to 20 by 2’s
seq(from = 2, to = 20, by = 2)
## [1] 2 4 6 8 10 12 14 16 18 20
Leaving out the argument names
Keep in mind that in R
, we don’t have to fill in all of the argument names. If we list three input values, R
assumes the order is from
, to
, by
.
seq(2, 20, 2)
## [1] 2 4 6 8 10 12 14 16 18 20
Default entries
Something else to keep in mind–we don’t have to fill in every possible argument to a function. Only the necessary ones. For example, if we leave the “by” argument empty, R will assume a default value of 1. Try running this to see!
seq(2, 20)
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Function Documentation
R
keeps documentation for all of its built-in functions. By running a ?
in front of the function, your web browser will open up the documentation page online.
Notice that the documentation will tell you 1) all possible arguments you could define inside that function, 2) default entries if left undefined, and a few other things.
Summarizing Data
Introducing Data Frames
A data frame in R
is a collection of vectors, where each vector represents one variable of data. Typically, each column of a data frame is a variable, and each row represents one unit of observation.
Creating vs. Uploading
In the “Navigating RStudio” video earlier, I demo creating a data frame in code from scratch. However, you won’t ever need to create data frames that way in this class.
Instead, you’ll either be using a data frame stored in a package, or you’ll be importing a data file (like an Excel Spreadsheet) directly into your RStudio session.
Data from a Package
The prostate
data frame includes health statistics from a sample of 97 people.
This data is stored in a package called faraway
.
A Package?
A lot of datasets and special functions might be saved in packages that other people create and share with the community.
If you have RStudio open, perhaps you’re ready to install your first package! But feel free to come back and watch this later.
Library
After installing a package, you can use The library
function to activate that package. Keep in mind:
- Packages only need to be installed once (like installing an app on your phone!)
- Packages need to be libraried every time you start a new session of
R
! (like opening an app on your phone after closing it)
Below, I am activating the faraway
package into my session of RStudio with library
, and then viewing the prostate
data frame.
library(faraway)
prostate
summary function
The summary
function is a quick way to produce several helpful summary statistics for all of our variables at once. It will produce the 5-number summary, as well as the mean for all numeric variables.
summary(prostate)
## lcavol lweight age lbph
## Min. :-1.3471 Min. :2.375 Min. :41.00 Min. :-1.3863
## 1st Qu.: 0.5128 1st Qu.:3.376 1st Qu.:60.00 1st Qu.:-1.3863
## Median : 1.4469 Median :3.623 Median :65.00 Median : 0.3001
## Mean : 1.3500 Mean :3.653 Mean :63.87 Mean : 0.1004
## 3rd Qu.: 2.1270 3rd Qu.:3.878 3rd Qu.:68.00 3rd Qu.: 1.5581
## Max. : 3.8210 Max. :6.108 Max. :79.00 Max. : 2.3263
## svi lcp gleason pgg45
## Min. :0.0000 Min. :-1.3863 Min. :6.000 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.:-1.3863 1st Qu.:6.000 1st Qu.: 0.00
## Median :0.0000 Median :-0.7985 Median :7.000 Median : 15.00
## Mean :0.2165 Mean :-0.1794 Mean :6.753 Mean : 24.38
## 3rd Qu.:0.0000 3rd Qu.: 1.1786 3rd Qu.:7.000 3rd Qu.: 40.00
## Max. :1.0000 Max. : 2.9042 Max. :9.000 Max. :100.00
## lpsa
## Min. :-0.4308
## 1st Qu.: 1.7317
## Median : 2.5915
## Mean : 2.4784
## 3rd Qu.: 3.0564
## Max. : 5.5829
Call on a singular variable
If you want to reference a specific variable inside of a data frame, use the $
operator.
For the following code, read it as: “Within prostate
, pull up the variable age
.”
prostate$age
## [1] 50 58 74 58 62 50 64 58 47 63 65 63 63 67 57 66 70 66 41 70 59 60 59 63 69
## [26] 68 65 67 67 65 65 65 71 54 63 64 73 64 68 56 60 68 62 61 66 61 79 68 43 70
## [51] 68 64 64 68 59 66 47 49 70 61 73 63 72 66 64 61 68 72 69 72 60 77 69 60 69
## [76] 68 72 78 69 63 66 57 77 65 60 64 58 62 65 76 68 61 68 44 52 68 68
summary of one variable
Using the $
operator, we could also choose to run a numeric summary for one specific variable.
summary(prostate$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 41.00 60.00 65.00 63.87 68.00 79.00
Other statistics
You can also use functions like mean
, sd
, and median
directly.
sd(prostate$lweight)
## [1] 0.4966286
mean(prostate$lweight)
## [1] 3.652689
median(prostate$age)
## [1] 65
Visualize it with Histograms
We’ll learn a lot more about visualizing data later, but you can use the base R function hist
to visualize a numeric variable!
hist(prostate$lweight)
Number of bins
To control the number of bins using the hist
function, you can add a breaks
argument. For example, if I wanted 30 bin breaks, I can do the following, or you may try changing the number to see what happens!
hist(prostate$lweight, breaks = 30)
Exploring the diabetes data frame
Now Lets take a look at a new dataset
library the faraway
package again, and then call up the data frame named diabetes
to display.
library(_______)
________
library(faraway)
diabetes
Summary
Now, run a summary of all variables in the diabetes
data frame.
summary(____)
summary(diabetes)
Visualize it
Create a histogram to visualize the age
variable within diabetes
.
hist(diabetes$___)
hist(diabetes$age)
One more
Lastly, calculate the standard deviation of the age
variable within diabetes
.
sd(diabetes$____)
sd(diabetes$age)
Return Home
Great job! If you’re in STAT 212, be sure to complete any other tutorials mentioned on your canvas assignment page for this lab assignment.
This tutorial was created by Kelly Findley with help from Brandon Pazmino (UIUC ’21). We hope this experience was helpful for you!
If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/