Skip to Tutorial Content

Vector Tools

Vector Review

As a reminder, a vector in R is a set of values that have been tied together as one object. The function c is used to concatenate entries into an object.

Vectors can be either a set of character entries…

c("yes", "no", "no", "yes", "no", "no", "no", "yes")

…or a set of numeric entries.

c(3,5,1,2,4,3,4,3)

Sequencing (This is Review!)

Sometimes you might wish to create a vector that goes in a sequence. Rather than write out 1, 2, 3, 4… or 1, 3, 5, 7, etc., you can use seq to do that!

Check out the function below to see how each argument affects how the sequence is created.

seq(from = 10, to = 80, by = 5)
#or without listing the argument names, R knows this order

seq(10, 80, 5)

Repeating (This is New!)

Another function is rep. This lets you repeat the same value a certain number of times. It’s a little like seq in that it saves you time creating a list of values, but specifically when you want to repeat a value!

rep(x = 2, times = 10)
#or without listing the argument names, R knows this order

rep(2, 10)

By the way, you can also use rep for character entries as well!

An example!

Let’s pretend we collected data from 20 participants in an experiment where 10 were in an experimental group and 10 were in a control group

First, Create an ID variable that is simply a vector from 1 to 20.

seq(__, __, __)
seq(1, 20, 1)

Another example

Now, what if we wanted to create a variable that represented one group each participant was in?

c(_____, _____)
c(rep("Experimental", 10), rep("Control", 10))

Sampling

The Sample function

The sample function in R lets you randomly choose certain elements from a vector.

You need to

  • x: Identify the vector you wish to sample from
  • size: Identify how many times to sample from that vector
  • replace: Choose whether you will sample with replacement or not
V = seq(1, 20, 1)
sample(x = V, size = 10, replace = TRUE)

Notice that each time you run it, you will (likely) get a different sample! If set to TRUE, you may get the same valued sample multiple times. If not, then you will get unique entries only in your sample.

Feel free to adjust the code above to change the parameters!

Sampling from Discrete population

Let’s say the discrete random variable X follows some probability distribution. If I want to take a random sample from that discrete distribution, I can use sample again!

The example above assumed each value was equally likely, but if don’t want to assume that, we can add an additional argument:

  • prob. This will take a vector of probabilities that map to each entry in your vector in order.

Check out the example below!

sample(x = c(1,2,3,4), size = 10, replace = T, prob = c(0.1, 0.4, 0.2, 0.3))

Also note we can define these vectors beforehand as well.

x = c(1, 2, 3, 4)
p = c(0.1, 0.2, 0.3, 0.4)

sample(x = x, size = 10, replace = T, prob = p)

Practice

Can you sample from a random variable X that takes the values 3,4,5,6,7, with the probabilities 0.1, 0.15, 0.2, 0.25, 0.3? Take a sample of size 20 and sample with replacement.

You can either define those vectors before your sample function, or define them inside the arguments. Feel free to try both ways!

sample(x = c(____), size = ___, replace = ___, prob = ___)
sample(x = c(3, 4, 5, 6, 7), size = 20, replace = T, prob = c(0.1, 0.15, 0.2, 0.25, 0.3))

Descriptive measures and plots

Descriptive measures

In the first tutorial, you might have seen measures like mean, median, sd, var, etc.

V = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)
mean(V)
median(V)
sd(V)
var(V)

Histograms

You can also plot a vector! For this class, we will just the basic base R plot functions.

Histograms can be made with hist. These would showcase a single numeric variable.

V = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)

hist(V)

Feel free to search ?hist or just do a quick web search to see ways to customize a histogram (change number of bins, add a title, change color, etc.)!

Plots

The plot function would be used to plot a scatterplot of two numeric variables. Let’s say we have two vectors defined separately:

X = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)
Y = c(1,5,3,4,5,3,6,7,8,5,6,4,8,6,7,5)

plot(X, Y)

And if your vectors are inside a data frame, you could do it one of two ways!

  • Use the $ operator to access the vectors through your data frame (easier!)
  • Call on the data frame with the data = argument. Then list your y axis variable, followed by ~, and then your x axis variable
Data = data.frame(
  X = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30),
  Y = c(1,5,3,4,5,3,6,7,8,5,6,4,8,6,7,5)
)

plot(Data$X, Data$Y)

#or

plot(data = Data, Y ~ X)

Done!

That should be it for now!

Sampling and Plotting