Vector Tools
Vector Review
As a reminder, a vector in R
is a set of values that have been tied together as one object. The function c
is used to concatenate entries into an object.
Vectors can be either a set of character entries…
c("yes", "no", "no", "yes", "no", "no", "no", "yes")
…or a set of numeric entries.
c(3,5,1,2,4,3,4,3)
Sequencing (This is Review!)
Sometimes you might wish to create a vector that goes in a sequence. Rather than write out 1, 2, 3, 4… or 1, 3, 5, 7, etc., you can use seq
to do that!
Check out the function below to see how each argument affects how the sequence is created.
seq(from = 10, to = 80, by = 5)
#or without listing the argument names, R knows this order
seq(10, 80, 5)
Repeating (This is New!)
Another function is rep
. This lets you repeat the same value a certain number of times. It’s a little like seq
in that it saves you time creating a list of values, but specifically when you want to repeat a value!
rep(x = 2, times = 10)
#or without listing the argument names, R knows this order
rep(2, 10)
By the way, you can also use rep
for character entries as well!
An example!
Let’s pretend we collected data from 20 participants in an experiment where 10 were in an experimental group and 10 were in a control group
First, Create an ID variable that is simply a vector from 1 to 20.
seq(__, __, __)
seq(1, 20, 1)
Another example
Now, what if we wanted to create a variable that represented one group each participant was in?
c(_____, _____)
c(rep("Experimental", 10), rep("Control", 10))
Sampling
The Sample function
The sample
function in R lets you randomly choose certain elements from a vector.
You need to
- x: Identify the vector you wish to sample from
- size: Identify how many times to sample from that vector
- replace: Choose whether you will sample with replacement or not
V = seq(1, 20, 1)
sample(x = V, size = 10, replace = TRUE)
Notice that each time you run it, you will (likely) get a different sample! If set to TRUE, you may get the same valued sample multiple times. If not, then you will get unique entries only in your sample.
Feel free to adjust the code above to change the parameters!
Sampling from Discrete population
Let’s say the discrete random variable X follows some probability distribution. If I want to take a random sample from that discrete distribution, I can use sample
again!
The example above assumed each value was equally likely, but if don’t want to assume that, we can add an additional argument:
prob
. This will take a vector of probabilities that map to each entry in your vector in order.
Check out the example below!
sample(x = c(1,2,3,4), size = 10, replace = T, prob = c(0.1, 0.4, 0.2, 0.3))
Also note we can define these vectors beforehand as well.
x = c(1, 2, 3, 4)
p = c(0.1, 0.2, 0.3, 0.4)
sample(x = x, size = 10, replace = T, prob = p)
Practice
Can you sample from a random variable X that takes the values 3,4,5,6,7, with the probabilities 0.1, 0.15, 0.2, 0.25, 0.3? Take a sample of size 20 and sample with replacement.
You can either define those vectors before your sample
function, or define them inside the arguments. Feel free to try both ways!
sample(x = c(____), size = ___, replace = ___, prob = ___)
sample(x = c(3, 4, 5, 6, 7), size = 20, replace = T, prob = c(0.1, 0.15, 0.2, 0.25, 0.3))
Descriptive measures and plots
Descriptive measures
In the first tutorial, you might have seen measures like mean, median, sd, var, etc.
V = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)
mean(V)
median(V)
sd(V)
var(V)
Histograms
You can also plot a vector! For this class, we will just the basic base R plot functions.
Histograms can be made with hist
. These would showcase a single numeric variable.
V = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)
hist(V)
Feel free to search ?hist
or just do a quick web search to see ways to customize a histogram (change number of bins, add a title, change color, etc.)!
Plots
The plot
function would be used to plot a scatterplot of two numeric variables. Let’s say we have two vectors defined separately:
X = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30)
Y = c(1,5,3,4,5,3,6,7,8,5,6,4,8,6,7,5)
plot(X, Y)
And if your vectors are inside a data frame, you could do it one of two ways!
- Use the
$
operator to access the vectors through your data frame (easier!) - Call on the data frame with the
data =
argument. Then list your y axis variable, followed by~
, and then your x axis variable
Data = data.frame(
X = c(32, 34, 65, 63, 41, 36, 76, 33, 35, 41, 47, 57, 28, 35, 41, 30),
Y = c(1,5,3,4,5,3,6,7,8,5,6,4,8,6,7,5)
)
plot(Data$X, Data$Y)
#or
plot(data = Data, Y ~ X)
Done!
That should be it for now!