Introduction
Tutorial Goals
In this tutorial, we will use ifelse
functions (literally the words “if else”) to create binary variables from numeric inputs! This will include…
- Creating a binary variable from a numeric variable by running it through an
ifelse
operator - Finding a proportion of outcomes that above/below a range by wrapping our
ifelse
with amean
function. - Placing our
ifelse
created proportions in a summary table
ifelse Statement Introduction
How it works
The ifelse
statement has three inputs
test
: a logical criteria that a row will either meet or not meetyes
: the value to output if the criteria is metno
: the value to output if the criteria is not met
You might think about them as the name of the function…
- If this logical statement is true…
- Output this first value
- Else, output this second value
Simple example
Consider the following vector: ages = c(13, 9, 8, 14, 21, 17, 19, 16, 22)
We can run an ifelse
statement to check which ages are at least 18 and therefore eligible to vote.
ifelse(ages >= 18, "eligible to vote", "not eligible")
How did that work?
- We set a logical statement to ask whether each value in
ages
was > or = to 18. - If met, the output would be “eligible to vote”
- If not met, the output would be “not eligible”
Notice that the 5th, 7th and 9th values were at least 18!
ifelse with data frames
An example with heart rate
Consider the following data representing 10 students. In this dataset, we have variables for:
- Their names
- Their heart rates at time of data collection
- Their class level
- Their number of siblings
Class
Dichotomize Siblings
Let’s run an ifelse
where our logical criteria is whether a student has siblings or not. We’ll input the following arguments
- A logical criteria of whether the
siblings
variable insideClass
reports a value greater than 0. - An output if true
- An output if false
ifelse(Class$siblings > 0, "yes", "no")
How did it work?
It took the vector Class$siblings
and went through each value in order. It then outputted yes
when that criteria was met and no
when it wasn’t.
If you check back to the original data frame, you should see that the 2nd and 10th person have 0 siblings, which should match up with the output you see here!
Don’t forget $
Remember that if our variable is embedded in a data frame, don’t forget to call it up through the data frame with the $
operator!
Adding onto our data
Let’s save our output as a new variable named sibling_binary
and place it in with our original dataframe.
We can do that by by linking sibling_binary
with a $ to our data frame Class
at the beginning of our line of code as the location it is being saved under.
Class$sibling_binary = ifelse(Class$siblings > 0, "yes", "no")
Class
Look carefully!
Notice that this new variable sibling_binary
now appears on the far right end. We could now use that variable within a plot where we call on Class
as our source data.
Proportions with ifelse
Mean wrappers
Perhaps we’d like to ask what proportion of responses are in a particular numeric range. We can do that by dichotomizing our responses to 0’s and 1’s, and taking the mean!
An example
Consider the penguins
data. Perhaps, we’d like to find what proportion of penguins have a flipper length of at least 200 mm.
Let’s first dichotomize the data to output a 1 if flipper length is at least 200 and 0 if it is not. If you wish to see the resulting vector, scroll to the right most column in the data!
penguins$flipper_binary = ifelse(penguins$flipper_length_mm >= 200, 1, 0)
penguins
Taking the mean
Now, to find the proportion of penguins who meet this criteria, we can take the mean of this vector of 1’s and 0’s! It’s kind of a sneaky way of getting the proportion.
There’s also an argument to remove the NA values since some entries are empty
mean(penguins$flipper_binary, na.rm = TRUE)
Mean wrapper
We technically didn’t need to save this as a new vector first. We could save a step and simply wrap our ifelse
statement in a mean function.
mean(ifelse(penguins$flipper_length_mm >= 200, 1, 0), na.rm = TRUE)
Summary Table Examples
Summary Table Review
One situation where we might wish to use this summary value is in a summary table!
Let’s first just remind ourselves how to do that. Let’s make a summary table reporting the mean and median flipper length for each of the three species of penguins in this dataset.
penguins |>
group_by(species) |>
summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE))
Adding proportions in
Now, if I wanted to add a third column, indicating what proportion of penguins had flipper lengths of at least 200 mm, I could add in the mean wrapped ifelse
statement as a third summary feature.
penguins |>
group_by(species) |>
summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE),
Proportion = mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE))
Rounding
And lastly, don’t forget we can round out some of those summary values if they get too long.
Let’s round the means to 2 decimals and the proportions to 3 decimals for this example. We can do that by wrapping a round
function across those two functions and adding the number of decimals to round to at the end.
penguins |>
group_by(species) |>
summarise(Mean = round(mean(flipper_length_mm, na.rm = TRUE),2),
Median = median(flipper_length_mm, na.rm = TRUE),
Proportion = round(mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE),3))
Return Home
This tutorial was created by Kelly Findley. I hope this was helpful for you!
If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/