Skip to Tutorial Content

Introduction

Tutorial Goals

In this tutorial, we will use ifelse functions (literally the words “if else”) to create binary variables from numeric inputs! This will include…

  • Creating a binary variable from a numeric variable by running it through an ifelse operator
  • Finding a proportion of outcomes that above/below a range by wrapping our ifelse with a mean function.
  • Placing our ifelse created proportions in a summary table

ifelse Statement Introduction

How it works

The ifelse statement has three inputs

  • test: a logical criteria that a row will either meet or not meet
  • yes: the value to output if the criteria is met
  • no: the value to output if the criteria is not met

You might think about them as the name of the function…

  • If this logical statement is true…
  • Output this first value
  • Else, output this second value

Simple example

Consider the following vector: ages = c(13, 9, 8, 14, 21, 17, 19, 16, 22)

We can run an ifelse statement to check which ages are at least 18 and therefore eligible to vote.

ifelse(ages >= 18, "eligible to vote", "not eligible")

How did that work?

  • We set a logical statement to ask whether each value in ages was > or = to 18.
  • If met, the output would be “eligible to vote”
  • If not met, the output would be “not eligible”

Notice that the 5th, 7th and 9th values were at least 18!

ifelse with data frames

An example with heart rate

Consider the following data representing 10 students. In this dataset, we have variables for:

  • Their names
  • Their heart rates at time of data collection
  • Their class level
  • Their number of siblings
Class

Dichotomize Siblings

Let’s run an ifelse where our logical criteria is whether a student has siblings or not. We’ll input the following arguments

  • A logical criteria of whether the siblings variable inside Class reports a value greater than 0.
  • An output if true
  • An output if false
ifelse(Class$siblings > 0, "yes", "no")

How did it work?

It took the vector Class$siblings and went through each value in order. It then outputted yes when that criteria was met and no when it wasn’t.

If you check back to the original data frame, you should see that the 2nd and 10th person have 0 siblings, which should match up with the output you see here!

Don’t forget $

Remember that if our variable is embedded in a data frame, don’t forget to call it up through the data frame with the $ operator!

Adding onto our data

Let’s save our output as a new variable named sibling_binary and place it in with our original dataframe.

We can do that by by linking sibling_binary with a $ to our data frame Class at the beginning of our line of code as the location it is being saved under.

Class$sibling_binary = ifelse(Class$siblings > 0, "yes", "no")

Class

Look carefully!

Notice that this new variable sibling_binary now appears on the far right end. We could now use that variable within a plot where we call on Class as our source data.

Proportions with ifelse

Mean wrappers

Perhaps we’d like to ask what proportion of responses are in a particular numeric range. We can do that by dichotomizing our responses to 0’s and 1’s, and taking the mean!

An example

Consider the penguins data. Perhaps, we’d like to find what proportion of penguins have a flipper length of at least 200 mm.

Let’s first dichotomize the data to output a 1 if flipper length is at least 200 and 0 if it is not. If you wish to see the resulting vector, scroll to the right most column in the data!

penguins$flipper_binary = ifelse(penguins$flipper_length_mm >= 200, 1, 0)

penguins

Taking the mean

Now, to find the proportion of penguins who meet this criteria, we can take the mean of this vector of 1’s and 0’s! It’s kind of a sneaky way of getting the proportion.

There’s also an argument to remove the NA values since some entries are empty

mean(penguins$flipper_binary, na.rm = TRUE)

Mean wrapper

We technically didn’t need to save this as a new vector first. We could save a step and simply wrap our ifelse statement in a mean function.

mean(ifelse(penguins$flipper_length_mm >= 200, 1, 0), na.rm = TRUE)

Summary Table Examples

Summary Table Review

One situation where we might wish to use this summary value is in a summary table!

Let’s first just remind ourselves how to do that. Let’s make a summary table reporting the mean and median flipper length for each of the three species of penguins in this dataset.

penguins |>
  group_by(species) |>
  summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
            Median = median(flipper_length_mm, na.rm = TRUE))

Adding proportions in

Now, if I wanted to add a third column, indicating what proportion of penguins had flipper lengths of at least 200 mm, I could add in the mean wrapped ifelse statement as a third summary feature.

penguins |>
  group_by(species) |>
  summarise(Mean = mean(flipper_length_mm, na.rm = TRUE),
            Median = median(flipper_length_mm, na.rm = TRUE),
            Proportion = mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE))

Rounding

And lastly, don’t forget we can round out some of those summary values if they get too long.

Let’s round the means to 2 decimals and the proportions to 3 decimals for this example. We can do that by wrapping a round function across those two functions and adding the number of decimals to round to at the end.

penguins |>
  group_by(species) |>
  summarise(Mean = round(mean(flipper_length_mm, na.rm = TRUE),2),
            Median = median(flipper_length_mm, na.rm = TRUE),
            Proportion = round(mean(ifelse(flipper_length_mm >= 200, 1, 0), na.rm = TRUE),3))

Return Home

This tutorial was created by Kelly Findley. I hope this was helpful for you!

If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/

If Else Statements