In this tutorial, we will use ifelse
functions (literally the words "if else") to create binary variables from numeric inputs! This will include...
ifelse
operatorifelse
with a mean
function.ifelse
created proportions in a summary tableThe ifelse
statement has three inputs
test
: a logical criteria that a row will either meet or not meetyes
: the value to output if the criteria is metno
: the value to output if the criteria is not metYou might think about them as the name of the function...
Consider the following vector: ages = c(13, 9, 8, 14, 21, 17, 19, 16, 22)
We can run an ifelse
statement to check which ages are at least 18 and therefore eligible to vote.
ifelse(ages >= 18, "eligible to vote", "not eligible")
ages
was > or = to 18.Notice that the 5th, 7th and 9th values were at least 18!
Consider the following data representing 10 students. In this dataset, we have variables for:
Class
Let's run an ifelse
where our logical criteria is whether a student has siblings or not. We'll input the following arguments
siblings
variable inside Class
reports a value greater than 0.ifelse(Class$siblings > 0, "yes", "no")
It took the vector Class$siblings
and went through each value in order. It then outputted yes
when that criteria was met and no
when it wasn't.
If you check back to the original data frame, you should see that the 2nd and 10th person have 0 siblings, which should match up with the output you see here!
Remember that if our variable is embedded in a data frame, don't forget to call it up through the data frame with the $
operator!
Let's save our output as a new variable named sibling_binary
and place it in with our original dataframe.
We can do that by by linking sibling_binary
with a $ to our data frame Class
at the beginning of our line of code as the location it is being saved under.
Class$sibling_binary = ifelse(Class$siblings > 0, "yes", "no")
Class
Notice that this new variable sibling_binary
now appears on the far right end. We could now use that variable within a plot where we call on Class
as our source data.
Perhaps we'd like to ask what proportion of responses are in a particular numeric range. We can do that by dichotomizing our responses to 0's and 1's, and taking the mean!
Consider the penguins
data. Perhaps, we'd like to find what proportion of penguins have a flipper length of at least 200 mm.
Let's first dichotomize the data to output a 1 if flipper length is at least 200 and 0 if it is not. If you wish to see the resulting vector, scroll to the right most column in the data!
penguins$flipper_binary = ifelse(penguins$flipper_length_mm >= 200, 1, 0)
penguins
Now, to find the proportion of penguins who meet this criteria, we can take the mean of this vector of 1's and 0's! It's kind of a sneaky way of getting the proportion.
There's also an argument to remove the NA values since some entries are empty
mean(penguins$flipper_binary, na.rm = TRUE)
We technically didn't need to save this as a new vector first. We could save a step and simply wrap our ifelse
statement in a mean function.
mean(ifelse(penguins$flipper_length_mm >= 200, 1, 0), na.rm = TRUE)
One situation where we might wish to use this summary value is in a summary table!
Let's first just remind ourselves how to do that. Let's make a summary table reporting the mean and median flipper length for each of the three species of penguins in this dataset.
penguins |>
group_by(species) |>
summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE))
Now, if I wanted to add a third column, indicating what proportion of penguins had flipper lengths of at least 200 mm, I could add in the mean wrapped ifelse
statement as a third summary feature.
penguins |>
group_by(species) |>
summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
Median = median(flipper_length_mm, na.rm = TRUE),
Proportion = mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE))
And lastly, don't forget we can round out some of those summary values if they get too long.
Let's round the means to 2 decimals and the proportions to 3 decimals for this example. We can do that by wrapping a round
function across those two functions and adding the number of decimals to round to at the end.
penguins |>
group_by(species) |>
summarise(Mean = round(mean(flipper_length_mm, na.rm = TRUE),2),
Median = median(flipper_length_mm, na.rm = TRUE),
Proportion = round(mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE),3))
This tutorial was created by Kelly Findley. I hope this was helpful for you!
If you'd like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/