This tutorial will show a lot of advanced features for ggplot. The primary purpose of this tutorial is to make you aware of possibilities! If some of the code seems complicated, that's ok.
In this tutorial, we will:
In ggplot2
, we think of different plot representations as a "geometry" or "geom" as we call it in code. This is how we choose to represent data visually.
Some examples of geoms we have seen include points, boxplots, violinplots, density curves, and histograms.
Different geoms can share different insights about the data.
Combining these two types of geoms can often lead to more powerful visuals that communicate much more information at once.
To format multiple geoms in the same plot, two common techniques are nesting and nudging
The penguins
data lists various anatomical measurements from over 300 penguins across 3 species.
library(palmerpenguins)
penguins
Violinplots are nice representations to express the distribution of a numeric variable.
ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species)) +
geom_violin()
Just a reminder, when you have multiple arguments inside a function, you can put each new argument on a new line to improve readability. The following code does the same thing, but I just put the aes arguments in new lines. This will be helpful for some of the longer code examples coming!
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin()
However, they don't clearly show summary information, like where the mean or median is. Let's try nesting a boxplot inside the violinplots.
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin() +
geom_boxplot()
A couple things here--the shapes look awkward, especially with the matching colors. We can play with two things here to make it look much nicer!
I'm also going to remove the legend (something we saw in a previous tutorial) since the color legend is redundant with the x axis labels.
ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3) +
geom_boxplot(width = 0.3) +
theme(legend.position = "none")
The choice of how narrow to make the width of the boxplots does depend on the distribution. In many cases where the distribution is narrower/more skewed, you might need a much narrower boxplot to nest inside! Just remember to play around with width.
Let's do a few more things we've learned before to make it look more professional:
theme_bw()
plot theme for a cleaner backgroundggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3) +
geom_boxplot(width = 0.3) +
labs(title = "Bill Length by Species",
x = "Species",
y = "Bill Length (mm)") +
scale_y_continuous(breaks = seq(30,60,5)) +
theme_bw() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
And for some contrast (maybe you like it, maybe you don't!), let's try:
color = "white"
in the geom_boxplot
linesize
argument.ggplot(data = penguins, aes(x = species,
y = bill_length_mm,
fill = species)) +
geom_violin(alpha = 0.3,
color = "white") +
geom_boxplot(width = 0.3,
color = "white") +
labs(title = "Bill Length by Species",
x = "Species",
y = "Bill Length (mm)") +
scale_y_continuous(breaks = seq(30,60,5)) +
theme_dark() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
Let's look at a new example involving the anaesthetic
data from the faraway
package.
library(faraway)
anaesthetic
Let's start by plotting the data as a jitter plot, and plotting a boxplot underneath. Notice that for this one:
aes
line colors the points and the boxplot outline.ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
geom_boxplot() +
geom_jitter(width = 0.05) +
theme_classic()
This plot functionally works, and we could certainly clean it up to make it look nicer. However, the lines naturally interfere with the points. Nudging the boxplot over might look cleaner.
We can do that with the position
argument inside one of our geometries.
position = position_nudge(x = ..., y = ...)
is a template where you can use, where what you assign to x and y is how much the position is nudged in the x or y direction. For example:
position = position_nudge(x = 0.3)
will nudge the geometry 0.3 units to the rightposition = position_nudge(x = -0.2, y = 0.5)
will nudge the geometry 0.2 units left and 0.5 units up.In the plot below, let's make the following changes
ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
geom_boxplot(width = 0.15, position = position_nudge(x = 0.25)) +
geom_jitter(width = 0.05) +
theme_classic()
Rather than color the boxplot outlines, we could go back to filling the boxplots instead. It's totally up to you and what looks more pleasing to your eye!
Note how you will have to manually add a block outline color to your boxplot line if you do that.
Now that some of these functions have a lot of lines, I'm going to start putting each argument on a new line for readability
ggplot(data = anaesthetic, aes(x = tgrp,
y = breath,
color = tgrp,
fill = tgrp)) +
geom_boxplot(color = "black",
width = 0.15,
position = position_nudge(x = 0.25)) +
geom_jitter(width = 0.05) +
theme_classic()
In this particular representaiton, notice that boxplot report outlier values. However, this is redundant with the points represented on the left and make look weird.
We can add an argument to ask the boxplots not to show outliers: outlier.shape = NA
ggplot(data = anaesthetic, aes(x = tgrp,
y = breath,
color = tgrp,
fill = tgrp)) +
geom_boxplot(color = "black",
width = 0.15,
position = position_nudge(x = 0.25),
outlier.shape = NA) +
geom_jitter(width = 0.1) +
theme_classic() +
theme(legend.position = "none")
Ridge plots can also look nice when combined with boxplots. Let's plot the same data, but let's use the geom_density_ridges
option from the ggridges
package.
theme_classic
for a nice clean blank backgroundlibrary(ggridges)
ggplot(data = anaesthetic, aes(x = breath,
y = tgrp,
color = tgrp,
fill = tgrp)) +
geom_density_ridges() +
geom_boxplot(color = "black",
width = 0.15,
outlier.shape = NA) +
theme_classic() +
theme(legend.position = "none")
Ridge plots can be difficult to work with--they need a lot of customization to really look nice.
First, let's add some things to geom_density_ridges
alpha
so that the boxplots "pop" in comparison.scale
argument and set it below 1 so that they don't overlap each one above.I'm also going to...
ggplot(data = anaesthetic, aes(x = breath,
y = tgrp,
color = tgrp,
fill = tgrp)) +
geom_density_ridges(alpha = 0.5,
color = "black",
scale = 0.8) +
geom_boxplot(color = "black",
width = 0.15,
outlier.shape = NA,
size = 0.8) +
theme_classic() +
theme(legend.position = "none") +
scale_x_continuous(limits = c(0,20))
Much of this choice is up to you and what looks most aesthetically pleasing! I find when there aren't that many data points, it's nice to represent them all. When there are a lot, the data points can begin to look overwhelming to plot.
Let's try a jitter plot with nudged boxplots using the penguins
data again.
For a challenge, we'll change the orientation from the example we saw earlier--put bill_length_mm
on the y axis and species
on the x axis.
I'm going to go crazy on the features this time.
ggplot(data = penguins, aes(x = bill_length_mm,
y = species,
fill = species,
color = species)) +
geom_jitter(height = 0.05) +
geom_boxplot(color = "black",
width = 0.15,
size = 0.8,
position = position_nudge(y = 0.2),
outlier.shape = NA) +
theme_bw() +
labs(title = "Bill Length across Species", x = "Bill Length (mm)", y = "Species") +
scale_fill_manual(values = c("orchid2", "seagreen", "gold2")) +
scale_color_manual(values = c("orchid2", "seagreen", "gold2")) +
scale_x_continuous(breaks = seq(30,60,5)) +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.title = element_text(color = "black", size = 12),
axis.text = element_text(color = "black", size = 11))
Up until this point, we have created and saved subsets to adjust what data gets plotted in our plots. This works, but requires saving extra elements to your global environment. Doing this several times can really clutter your global environment!
Another option is to take advantage of pipes to filter down and go directly into a plot.
Before, you might remember that some of our penguins weren't identified by sex, leaving them with "NA" as their entry on that variable. Therefore, plotting by sex can be messy without first removing those NA values.
ggplot(data = penguins, aes(x = bill_length_mm, y = sex, fill = sex)) +
geom_boxplot()
Let's build a pipe to filter out this category, but then add a plot afterwards. Notice that:
sex
except "NA"penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = sex, fill = sex)) +
geom_boxplot()
Let's put together everything we've seen so far. Let's compare the bill length of penguins across species and across sex (with "NA" removed)
species
is represented on the y axis, bill_length_mm
on the x axis, and sex
as the fill color.penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
geom_boxplot()
Now, let's add a few more features
penguins %>%
filter(sex != "NA") %>%
ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
geom_boxplot() +
theme_bw() +
scale_fill_manual(values = c("lightpink", "steelblue1")) +
labs(title = "Bill Length across Species",
x = "Bill Length (mm)",
y = "Species",
fill = "Sex") +
scale_x_continuous(breaks = seq(30,60,5)) +
theme(axis.text = element_text(color = "black", size = 10),
axis.title = element_text(size = 11, face = "bold"),
legend.title = element_text(size = 11, face = "bold"),
plot.title = element_text(size = 14, face = "bold", hjust = 0.5))
This tutorial was built by Dr. Kelly Findley from UIUC Statistics. I hope this experience was helpful for you!
If you'd like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/