Tutorial Introduction
Tutorial Goals
This tutorial will show a lot of advanced features for ggplot. The primary purpose of this tutorial is to make you aware of possibilities! If some of the code seems complicated, that’s ok.
In this tutorial, we will:
- Learn how to combine multiple geometries in one plot
- Learn basic features to format multiple geometries neatly (like changing width, or nudging a geom)
- Talk about adding a ggplot directly into a pipe (to add some efficiency to your coding!)
What is a Geometry again?
In ggplot2
, we think of different plot representations as a “geometry” or “geom” as we call it in code. This is how we choose to represent data visually.
Some examples of geoms we have seen include points, boxplots, violinplots, density curves, and histograms.
Why Multiple Geoms?
Different geoms can share different insights about the data.
- Points show all of our data points
- Boxplots just represent the 5-number summary
- Densities and violinplots represent the shape of the data
Combining these two types of geoms can often lead to more powerful visuals that communicate much more information at once.
Nesting and Nudging
To format multiple geoms in the same plot, two common techniques are nesting and nudging
- With nesting, the goal is to put the biggest/widest geom down first, and then include narrower geoms on top (often with some transparency added)
- With nudging, we lay multiple geoms side by side by nudging one to the left/right or up/down.
Nesting Geoms
Revisiting the Penguins
The penguins
data lists various anatomical measurements from over 300 penguins across 3 species.
library(palmerpenguins)
penguins
Boxplots inside violinplots
Violinplots are nice representations to express the distribution of a numeric variable.
ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species)) +
geom_violin()
A reminder about multiple lines
Just a reminder, when you have multiple arguments inside a function, you can put each new argument on a new line to improve readability. The following code does the same thing, but I just put the aes arguments in new lines. This will be helpful for some of the longer code examples coming!
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin()
Adding Boxplots
However, they don’t clearly show summary information, like where the mean or median is. Let’s try nesting a boxplot inside the violinplots.
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin() +
geom_boxplot()
Playing with Width and Transparency
A couple things here–the shapes look awkward, especially with the matching colors. We can play with two things here to make it look much nicer!
- Make the boxplots narrower so that they fit inside
- Adjust the transparency of the violins to be a different shade (even if it’s the same color argument).
- I’ll also remove the redundant legend.
I’m also going to remove the legend (something we saw in a previous tutorial) since the color legend is redundant with the x axis labels.
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3) +
geom_boxplot(width = 0.3) +
theme(legend.position = "none")
The choice of how narrow to make the width of the boxplots does depend on the distribution. In many cases where the distribution is narrower/more skewed, you might need a much narrower boxplot to nest inside! Just remember to play around with width.
More Customization
Let’s do a few more things we’ve learned before to make it look more professional:
- Add labels
- Adjust the tick mark frequency on the y-axis
- Add the
theme_bw()
plot theme for a cleaner background - Add some text formatting for the axes and title
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3) +
geom_boxplot(width = 0.3) +
labs(title = "Bill Length by Species",
x = "Species",
y = "Bill Length (mm)") +
scale_y_continuous(breaks = seq(30,60,5)) +
theme_bw() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
A Twist!
And for some contrast (maybe you like it, maybe you don’t!), let’s try:
- coloring the boxplot border by writing
color = "white"
in thegeom_boxplot
line - Thickening the border color with the
size
argument.
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3,
color = "white") +
geom_boxplot(width = 0.3,
color = "white") +
labs(title = "Bill Length by Species",
x = "Species",
y = "Bill Length (mm)") +
scale_y_continuous(breaks = seq(30,60,5)) +
theme_dark() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
Nudging Geometries
Comparing Nesting and Nudging
Let’s look at a new example involving the anaesthetic
data from the faraway
package.
library(faraway)
anaesthetic
First plot
Let’s start by plotting the data as a jitter plot, and plotting a boxplot underneath. Notice that for this one:
- The color argument in the
aes
line colors the points and the boxplot outline.
ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
geom_boxplot() +
geom_jitter(width = 0.05) +
theme_classic()
This plot functionally works, and we could certainly clean it up to make it look nicer. However, the lines naturally interfere with the points. Nudging the boxplot over might look cleaner.
Nudging
We can do that with the position
argument inside one of our geometries.
position = position_nudge(x = ..., y = ...)
is a template where you can use, where what you assign to x and y is how much the position is nudged in the x or y direction. For example:
position = position_nudge(x = 0.3)
will nudge the geometry 0.3 units to the rightposition = position_nudge(x = -0.2, y = 0.5)
will nudge the geometry 0.2 units left and 0.5 units up.
Example with nudge
In the plot below, let’s make the following changes
- Nudge the boxplot 0.25 units to the right.
- Narrow the width of the boxplots to 0.15
ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
geom_boxplot(width = 0.15, position = position_nudge(x = 0.25)) +
geom_jitter(width = 0.05) +
theme_classic()
To fill or not to fill
Rather than color the boxplot outlines, we could go back to filling the boxplots instead. It’s totally up to you and what looks more pleasing to your eye!
Note how you will have to manually add a block outline color to your boxplot line if you do that.
Now that some of these functions have a lot of lines, I’m going to start putting each argument on a new line for readability
ggplot(data = anaesthetic, aes(x = tgrp,
y = breath,
color = tgrp,
fill = tgrp)) +
geom_boxplot(color = "black",
width = 0.15,
position = position_nudge(x = 0.25)) +
geom_jitter(width = 0.05) +
theme_classic()
Cleaning up the Outliers
In this particular representaiton, notice that boxplot report outlier values. However, this is redundant with the points represented on the left and make look weird.
We can add an argument to ask the boxplots not to show outliers: outlier.shape = NA
ggplot(data = anaesthetic, aes(x = tgrp,
y = breath,
color = tgrp,
fill = tgrp)) +
geom_boxplot(color = "black",
width = 0.15,
position = position_nudge(x = 0.25),
outlier.shape = NA) +
geom_jitter(width = 0.1) +
theme_classic() +
theme(legend.position = "none")
Nudging with Ridgeplots (Optional)
Boxplots with Ridge plots
Ridge plots can also look nice when combined with boxplots. Let’s plot the same data, but let’s use the geom_density_ridges
option from the ggridges
package.
- Keep the black border color for the boxplots, the width, and no outlier option
- Let’s also use
theme_classic
for a nice clean blank background
library(ggridges)
ggplot(data = anaesthetic, aes(x = breath,
y = tgrp,
color = tgrp,
fill = tgrp)) +
geom_density_ridges() +
geom_boxplot(color = "black",
width = 0.15,
outlier.shape = NA) +
theme_classic() +
theme(legend.position = "none")
Adjusting the densities
Ridge plots can be difficult to work with–they need a lot of customization to really look nice.
First, let’s add some things to geom_density_ridges
- Add some transparency with
alpha
so that the boxplots “pop” in comparison. - Add a black border color to the ridges
- Add a
scale
argument and set it below 1 so that they don’t overlap each one above.
I’m also going to…
- Add a limit argument to the x scale so that it starts at 0 and ends at 20. (seems weird to have the scale start negative)
- Increase the size of the boxplot border–I think it just looks better that way
ggplot(data = anaesthetic, aes(x = breath,
y = tgrp,
color = tgrp,
fill = tgrp)) +
geom_density_ridges(alpha = 0.5,
color = "black",
scale = 0.8) +
geom_boxplot(color = "black",
width = 0.15,
outlier.shape = NA,
size = 0.8) +
theme_classic() +
theme(legend.position = "none") +
scale_x_continuous(limits = c(0,20))
When to Nudge and when to Nest?
Much of this choice is up to you and what looks most aesthetically pleasing! I find when there aren’t that many data points, it’s nice to represent them all. When there are a lot, the data points can begin to look overwhelming to plot.
One more example
Let’s try a jitter plot with nudged boxplots using the penguins
data again.
For a challenge, we’ll change the orientation from the example we saw earlier–put bill_length_mm
on the y axis and species
on the x axis.
I’m going to go crazy on the features this time.
ggplot(data = penguins, aes(x = bill_length_mm,
y = species,
fill = species,
color = species)) +
geom_jitter(height = 0.05) +
geom_boxplot(color = "black",
width = 0.15,
size = 0.8,
position = position_nudge(y = 0.2),
outlier.shape = NA) +
theme_bw() +
labs(title = "Bill Length across Species", x = "Bill Length (mm)", y = "Species") +
scale_fill_manual(values = c("orchid2", "seagreen", "gold2")) +
scale_color_manual(values = c("orchid2", "seagreen", "gold2")) +
scale_x_continuous(breaks = seq(30,60,5)) +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
Plot Codes inside Pipes
Saving a step
Up until this point, we have created and saved subsets to adjust what data gets plotted in our plots. This works, but requires saving extra elements to your global environment. Doing this several times can really clutter your global environment!
Another option is to take advantage of pipes to filter down and go directly into a plot.
Filtering out Penguins
Before, you might remember that some of our penguins weren’t identified by sex, leaving them with “NA” as their entry on that variable. Therefore, plotting by sex can be messy without first removing those NA values.
ggplot(data = penguins, aes(x = bill_length_mm, y = sex, fill = sex)) +
geom_boxplot()
A pipe instead
Let’s build a pipe to filter out this category, but then add a plot afterwards. Notice that:
- We created a pipe and used filter to use all categories of
sex
except “NA” - We then piped directly into a ggplot
- The ggplot is exactly what we’d normally do, except that we don’t add the data argument–it’s already identified at the beginning of our pipe.
penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = sex, fill = sex)) +
geom_boxplot()
Clustered boxplots
Let’s put together everything we’ve seen so far. Let’s compare the bill length of penguins across species and across sex (with “NA” removed)
- We will again start with a pipe to remove the penguins that have NA listed for sex.
- Then, let’s build clustered boxplots where
species
is represented on the y axis,bill_length_mm
on the x axis, andsex
as the fill color.
penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
geom_boxplot()
Add some features
Now, let’s add a few more features
- Adjust the title and labels
- Format the title and labels
- Add a nice background
- Add some colors.
penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
geom_boxplot() +
theme_bw() +
scale_fill_manual(values = c("lightpink", "steelblue1")) +
labs(title = "Bill Length across Species",
x = "Bill Length (mm)",
y = "Species",
fill = "Sex") +
scale_x_continuous(breaks = seq(30,60,5)) +
theme(axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 11, face = "bold"),
legend.title = element_text(size = 11, face = "bold"),
plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
Acknowledgment
This tutorial was built by Dr. Kelly Findley from UIUC Statistics. I hope this experience was helpful for you!
If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/