Introduction to R

Basic Computation

Welcome!

Welcome to your first tutorial for coding in R! In this tutorial set, we'll discuss how to set up calculations, create and use basic data structures, and run several basic descriptor commands.

Keep in mind...

Throughout this tutorial, you will see code chunks like this:

2+2

Often, these code chunks will be completed and ready to go for demonstration. You should run them to see what happens.

You should also feel free to play around with them too and run them with other entries! Don't worry, you won't break the tutorial by changing the contents. :)

Some code chunks will be challenges for you to fill. In these, use the provided hints...the last hint will be the suggested solution. You can also use submit to check that your output is correct!

Arithmetic

First lets practice basic computations with R. Addition, subtraction, multiplication, division, and exponents use symbols that are likely already familiar to you.

look at the examples provided for simple computations and then produce some of your own:

56+1

66-60

45*2

81/9

5^2

sqrt(144)

Adding Parentheses

We can also use parentheses to complete multiple calculations at once.

When implmenting computations into R, keep in mind order of operations (PEMDAS), thus adding () into a certain portion of your math problem in R is essential if calculating multiple operations at once.

(25-5)/4

((6*3)-12)^2

Your Turn!

Output the code 8 plus 6, all divided by 2. The solution is available for reference:

(8 + 6)/2
#Did you use parentheses around 8+6?
(8+6)

Vectors in R

Introducing Vectors

A vector is a collection of items (for example, a list of numbers) that are tied together into one structure. To create a vector, we will use our first R function, c (which is short for "concatenate")

functions in R are usually a letter or name, followed by parentheses that include inputs for that function.

The following vector could represent the heights (in inches) of 13 adults. The entries are placed inside the function like an input, and then when I run this function, it outputs the same list of numbers, but tied together as a vector.

c(65,71,63,68,67,72,64,61,67,71,72,68,64)

Characters

An example of a character vector might be storing responses to a question that produces categorical responses. Notices that character entries should be in quotation marks (whereas numbers should typically be listed without quotation marks).

c("yes","yes","no","yes","no","yes","yes","yes","no","yes")

Sequences

In some cases (like plots), we might wish to create a sequence of equally placed numbers. There is a special function named "seq" that allows us to make a sequence from a starting value to a final value, by intervals of our choice.

Notice that this function now has multiple arguments to fill. We will define the three listed here.

seq(from = 2, to = 20, by = 2)

Leaving out the argument names

Keep in mind that in R, we don't have to fill in all of the argument names. If we list our inputs in this order, R will assume the order is...from, to, by...in that order.

seq(2, 20, 2)

Default entries

Something else to keep in mind--we don't have to fill in every possible argument to a function. Only the necessary ones. For example, if we leave the "by" argument empty, R will assume a default value of 1. Try running this to see!

seq(2, 20)

In case you're curious, you can always check out the documentation for a function by running ? in front of the name. This will give you info about what argument options are available, and what default entries are used if left undefined. It's a bit technical and confusing at first, but as you become more coding experienced, it can be very helpful to reference for new (to you) functions.

?seq

Creating Variables

We can also save vectors to a variable name--this is helpful when we might want to summarize or use this vector in a later command.

heights = c(65,71,63,68,67,72,64,61,67,71,72,68,64)

heights

breaks = seq(0,100,5)

breaks

Operations on a variable

We can complete arithmetic operations on vectors, as well as calculate various summary statistics if working with data.

Take a look at the following example, where we take our height vector and multiply it by 2.54 to convert these values from inches to centimeters.

Try changing 2.54 to a different number to observe what happens!

height = c(65,71,63,68,67,72,64,61,67,71,72,68,64)

height_cm = height*2.54

height_cm

Practice!

Give it a try! Create a sequence from 3 to 24 by 3's. Name this as Vector, and then divide Vector by 3. It should produce a vector from 1 to 8 by 1's after this division.

______ = ___(from = __, to = __, by = __)

Vector/__
Vector = seq(from = 3, to = __, by = __)

Vector/__
Vector = seq(from = 3, to = 24, by = 3)

Vector/3

More Practice!

Now, try creating a vector with the following data representing inches of precipitation for 12 months in Champaign.

Save this data as a vector named Temp_2019

3.85, 1.90, 5.09, 4.89, 6.08, 2.82, 3.38, 2.19, 3.36, 5.00, 1.91, 1.82

FYI: Weather data for the Champaign_Urbana area can be found here: https://stateclimatologist.web.illinois.edu/data/champaign-urbana/

3.85, 1.90, 5.09, 4.89, 6.08, 2.82, 3.38, 2.19, 3.36, 5.00, 1.91, 1.82
Temp_2019 = c(...)
Temp_2019 = c(3.85, 1.90, 5.09, 4.89, 6.08, 2.82, 3.38, 2.19, 3.36, 5.00, 1.91, 1.82)
Temp_2019

Data Frames (and Tibbles) in R

Introducing Data Frames

A data frame in R is a collection of vectors, where each vector represents one variable of data. Typically, each column of a data frame is a variable, and each row represents one observation (set of measurements from one individual at one point in time).

In an upcoming software video, we'll see how to use RStudio to import data into a session (since most of the time, we're working with data in a spreadsheet or some other file), but for now, we'll focus on data we create directly in R, or some named datasets that exist online in the R universe already for learning purposes.

Upload the Prostate Data frame with Package

In the following code, we will upload a data frame named "prostate." This data is saved in a package named "faraway." Packages are ways that R users can create code structures or data frames and share them with others! We'll use packages many times throughout the course.

Note that if using a package on your personal computer, you'll need to install it before librarying it. So if you want to replicate this next bit on your own computer, be sure to run the following: install.packages("faraway")

Once installed, you can activate any package for use in your current session of R by running library(package_name). In this case, the package name is faraway, so we will run that here!

library(faraway)
prostate

Note that library(faraway) calls on the location of this data, and then prostate is one (of many!) data frames in this package that we can access. By running just the name, we get a snapshot of this data frame in our output.

A Little Exploration

We can use different functions on a data frame to learn more about it. Here are a couple basic ones.

"Number of rows (observations)"
nrow(prostate)

"Number of coloumns (variables)"
ncol(prostate)

Create a Data Frame Manually

We can also create a data frame manually by entering named vectors that we want to tie together. We will use the command "data.frame", which concatenates vectors that we list separated by commas.

Class = data.frame(
  heights = c(65,71,63,68,67,72,64,61,67,71,72,68,64),
  responses = c("yes","yes","no","yes","no","yes","yes","yes","no","yes","no","no","yes")
)

Class

New Lines to Improve Readability

Notice in the code chunk above, we hit "enter" after each comma to list each variable in a new line. With most functions in R, you can insert line breaks to improve readability without changing the operation! We could list all of that in one long line, and it would run exactly the same, but it is now very difficult to read!

As you are learning to code, please please please make line breaks where appropriate! It will make it much easier for you and for those of us who might be helping you. :)

Data Frames with Multiple Variables

Now, can you try creating a data frame with two variables? Let's report the test scores of 5 fictional students, as well as their Names.

Scores: 90, 81, 87, 98, 78

Names: "Jose", "Maddie", "Peter", "Amy", and "Kara"

Let's call this data frame "Results."

Then be sure to call up this data frame at the end.

Don't forget to put a comma at the end of the Scores line!

______ = data.frame(
  Scores = ...
  Names = ...
)

Results
Results = data.frame(
  Scores = c(90, 81, ...),
  Names = c("Jose", ...)
)

Results
Results = data.frame(
  Scores = c(90, 81, 87, 98, 78),
  Names = c("Jose", "Maddie", "Peter", "Amy", "Kara")
)

Results

And Tibbles Too

You should also be aware that "tibbles" are another data structure that you may encounter. Tibbles behave exactly like data frames in basically every way--the only real difference is how they display data when called on.

In this R tutorial, you won't see a difference. In fact, this tutorial purposely displays data frames like a tibble! But if using R on your personal computer, you'll notice that data frames display clunkier. They might display as many as 1,000 rows of data, while tibbles display a truncated version, plus some additional variable info. Tibbles just give you an efficient run down!

The more data you work with in R, the more you'll notice the difference, and probably realize why tibbles are easier to work with than data frames.

We can actually take the same data from earlier and save it as a tibble.

Class = tibble(
  heights = c(65,71,63,68,67,72,64,61,67,71,72,68,64),
  responses = c("yes","yes","no","yes","no","yes","yes","yes","no","yes","no","no","yes")
)

Class

Summarizing Data

Summarizing Data

When analyzing data, we are often interested in summarizing certain variables in our data.

The summary command is a quick way to produce several helpful summary statistics for all of our variables at once. Summary produces the 5-number summary and the mean for all variables.

library(faraway)

summary(prostate)

We can also produce specific summaries for specific variables using commands like mean, sd, and median. Just make sure you call on specific variables by using the $ operator. This allows you to access a specific element of the data frame.

sd(prostate$lweight)
mean(prostate$lweight)
median(prostate$age)

Exploring the diabetes data frame

Now Lets take a look at a new dataset

library the faraway package again, and then call up the data frame named diabetes to display.

library(_______)
________
library(faraway)
diabetes

Calculate the numbers of observations from the dataset:

nrow(___)
nrow(diabetes)

Summary

Now, run a summary of the diabetes data frame.

summary(____)
summary(diabetes)

More Statistics

And lastly, calculate the standard deviation of the age variable (within diabetes).

sd(diabetes$____)
sd(diabetes$age)

This tutorial was created by Brandon Pazmino (UIUC '21) with editing and maintenance by Kelly Findley. We hope this experience was helpful for you!