Skip to Tutorial Content

Tutorial Introduction

Tutorial Goals

This tutorial will show a lot of advanced features for ggplot. The primary purpose of this tutorial is to make you aware of possibilities! If some of the code seems complicated, that’s ok.

In this tutorial, we will:

  • Learn how to combine multiple geometries in one plot
  • Learn basic features to format multiple geometries neatly (like changing width, or nudging a geom)
  • Talk about adding a ggplot directly into a pipe (to add some efficiency to your coding!)

What is a Geometry again?

In ggplot2, we think of different plot representations as a “geometry” or “geom” as we call it in code. This is how we choose to represent data visually.

Some examples of geoms we have seen include points, boxplots, violinplots, density curves, and histograms.

Why Multiple Geoms?

Different geoms can share different insights about the data.

  • Points show all of our data points
  • Boxplots just represent the 5-number summary
  • Densities and violinplots represent the shape of the data

Combining these two types of geoms can often lead to more powerful visuals that communicate much more information at once.

Nesting and Nudging

To format multiple geoms in the same plot, two common techniques are nesting and nudging

  • With nesting, the goal is to put the biggest/widest geom down first, and then include narrower geoms on top (often with some transparency added)
  • With nudging, we lay multiple geoms side by side by nudging one to the left/right or up/down.

Nesting Geoms

Revisiting the Penguins

The penguins data lists various anatomical measurements from over 300 penguins across 3 species.

library(palmerpenguins)
penguins

Boxplots inside violinplots

Violinplots are nice representations to express the distribution of a numeric variable.

ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species)) +
  geom_violin()

A reminder about multiple lines

Just a reminder, when you have multiple arguments inside a function, you can put each new argument on a new line to improve readability. The following code does the same thing, but I just put the aes arguments in new lines. This will be helpful for some of the longer code examples coming!

ggplot(data = penguins, aes(x = species, 
                            y = bill_length_mm, 
                            fill = species)) +
  geom_violin()

Adding Boxplots

However, they don’t clearly show summary information, like where the mean or median is. Let’s try nesting a boxplot inside the violinplots.

ggplot(data = penguins, aes(x = species, 
                            y = bill_length_mm, 
                            fill = species)) +
  geom_violin() +
  geom_boxplot()

Playing with Width and Transparency

A couple things here–the shapes look awkward, especially with the matching colors. We can play with two things here to make it look much nicer!

  • Make the boxplots narrower so that they fit inside
  • Adjust the transparency of the violins to be a different shade (even if it’s the same color argument).
  • I’ll also remove the redundant legend.

I’m also going to remove the legend (something we saw in a previous tutorial) since the color legend is redundant with the x axis labels.

ggplot(data = penguins, aes(x = species, 
                            y = bill_length_mm, 
                            fill = species)) +
  geom_violin(alpha = 0.3) +
  geom_boxplot(width = 0.3) +
  theme(legend.position = "none")

The choice of how narrow to make the width of the boxplots does depend on the distribution. In many cases where the distribution is narrower/more skewed, you might need a much narrower boxplot to nest inside! Just remember to play around with width.

More Customization

Let’s do a few more things we’ve learned before to make it look more professional:

  • Add labels
  • Adjust the tick mark frequency on the y-axis
  • Add the theme_bw() plot theme for a cleaner background
  • Add some text formatting for the axes and title
ggplot(data = penguins, aes(x = species, 
                            y = bill_length_mm, 
                            fill = species)) +
  geom_violin(alpha = 0.3) +
  geom_boxplot(width = 0.3) +
  labs(title = "Bill Length by Species", 
       x = "Species", 
       y = "Bill Length (mm)") +
  scale_y_continuous(breaks = seq(30,60,5)) +
  theme_bw() +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.title = element_text(color = "black", size = 12),
        axis.text = element_text(color = "black", size = 11))

A Twist!

And for some contrast (maybe you like it, maybe you don’t!), let’s try:

  • coloring the boxplot border by writing color = "white" in the geom_boxplot line
  • Thickening the border color with the size argument.
ggplot(data = penguins, aes(x = species, 
                            y = bill_length_mm, 
                            fill = species)) +
  geom_violin(alpha = 0.3,
              color = "white") +
  geom_boxplot(width = 0.3, 
               color = "white") +
  labs(title = "Bill Length by Species", 
       x = "Species", 
       y = "Bill Length (mm)") +
  scale_y_continuous(breaks = seq(30,60,5)) +
  theme_dark() +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.title = element_text(color = "black", size = 12),
        axis.text = element_text(color = "black", size = 11))

Nudging Geometries

Comparing Nesting and Nudging

Let’s look at a new example involving the anaesthetic data from the faraway package.

library(faraway)
anaesthetic

First plot

Let’s start by plotting the data as a jitter plot, and plotting a boxplot underneath. Notice that for this one:

  • The color argument in the aes line colors the points and the boxplot outline.
ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
  geom_boxplot() +
  geom_jitter(width = 0.05) +
  theme_classic()

This plot functionally works, and we could certainly clean it up to make it look nicer. However, the lines naturally interfere with the points. Nudging the boxplot over might look cleaner.

Nudging

We can do that with the position argument inside one of our geometries.

position = position_nudge(x = ..., y = ...) is a template where you can use, where what you assign to x and y is how much the position is nudged in the x or y direction. For example:

  • position = position_nudge(x = 0.3) will nudge the geometry 0.3 units to the right
  • position = position_nudge(x = -0.2, y = 0.5) will nudge the geometry 0.2 units left and 0.5 units up.

Example with nudge

In the plot below, let’s make the following changes

  • Nudge the boxplot 0.25 units to the right.
  • Narrow the width of the boxplots to 0.15
ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
  geom_boxplot(width = 0.15, position = position_nudge(x = 0.25)) +
  geom_jitter(width = 0.05) +
  theme_classic()

To fill or not to fill

Rather than color the boxplot outlines, we could go back to filling the boxplots instead. It’s totally up to you and what looks more pleasing to your eye!

Note how you will have to manually add a block outline color to your boxplot line if you do that.

Now that some of these functions have a lot of lines, I’m going to start putting each argument on a new line for readability

ggplot(data = anaesthetic, aes(x = tgrp, 
                               y = breath,
                               color = tgrp, 
                               fill = tgrp)) +
  geom_boxplot(color = "black", 
               width = 0.15, 
               position = position_nudge(x = 0.25)) +
  geom_jitter(width = 0.05) +
  theme_classic()

Cleaning up the Outliers

In this particular representaiton, notice that boxplot report outlier values. However, this is redundant with the points represented on the left and make look weird.

We can add an argument to ask the boxplots not to show outliers: outlier.shape = NA

ggplot(data = anaesthetic, aes(x = tgrp, 
                               y = breath, 
                               color = tgrp, 
                               fill = tgrp)) +
  geom_boxplot(color = "black", 
               width = 0.15, 
               position = position_nudge(x = 0.25),
               outlier.shape = NA) +
  geom_jitter(width = 0.1) +
  theme_classic() +
  theme(legend.position = "none")

Nudging with Ridgeplots (Optional)

Boxplots with Ridge plots

Ridge plots can also look nice when combined with boxplots. Let’s plot the same data, but let’s use the geom_density_ridges option from the ggridges package.

  • Keep the black border color for the boxplots, the width, and no outlier option
  • Let’s also use theme_classic for a nice clean blank background
library(ggridges)

ggplot(data = anaesthetic, aes(x = breath, 
                               y = tgrp, 
                               color = tgrp, 
                               fill = tgrp)) +
  geom_density_ridges() +
  geom_boxplot(color = "black", 
               width = 0.15, 
               outlier.shape = NA) +
  theme_classic() +
  theme(legend.position = "none")

Adjusting the densities

Ridge plots can be difficult to work with–they need a lot of customization to really look nice.

First, let’s add some things to geom_density_ridges

  • Add some transparency with alpha so that the boxplots “pop” in comparison.
  • Add a black border color to the ridges
  • Add a scale argument and set it below 1 so that they don’t overlap each one above.

I’m also going to…

  • Add a limit argument to the x scale so that it starts at 0 and ends at 20. (seems weird to have the scale start negative)
  • Increase the size of the boxplot border–I think it just looks better that way
ggplot(data = anaesthetic, aes(x = breath, 
                               y = tgrp, 
                               color = tgrp, 
                               fill = tgrp)) +
  geom_density_ridges(alpha = 0.5, 
                      color = "black",
                      scale = 0.8) +
  geom_boxplot(color = "black", 
               width = 0.15, 
               outlier.shape = NA,
               size = 0.8) +
  theme_classic() +
  theme(legend.position = "none") +
  scale_x_continuous(limits = c(0,20))

When to Nudge and when to Nest?

Much of this choice is up to you and what looks most aesthetically pleasing! I find when there aren’t that many data points, it’s nice to represent them all. When there are a lot, the data points can begin to look overwhelming to plot.

One more example

Let’s try a jitter plot with nudged boxplots using the penguins data again.

For a challenge, we’ll change the orientation from the example we saw earlier–put bill_length_mm on the y axis and species on the x axis.

I’m going to go crazy on the features this time.

ggplot(data = penguins, aes(x = bill_length_mm, 
                            y = species, 
                            fill = species, 
                            color = species)) +
  geom_jitter(height = 0.05) +
  geom_boxplot(color = "black",
               width = 0.15, 
               size = 0.8,
               position = position_nudge(y = 0.2),
               outlier.shape = NA) +
  theme_bw() +
  labs(title = "Bill Length across Species", x = "Bill Length (mm)", y = "Species") +
  scale_fill_manual(values = c("orchid2", "seagreen", "gold2")) +
  scale_color_manual(values = c("orchid2", "seagreen", "gold2")) +
  scale_x_continuous(breaks = seq(30,60,5)) +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5, face = "bold"),
        axis.title = element_text(color = "black", size = 12),
        axis.text = element_text(color = "black", size = 11))

Plot Codes inside Pipes

Saving a step

Up until this point, we have created and saved subsets to adjust what data gets plotted in our plots. This works, but requires saving extra elements to your global environment. Doing this several times can really clutter your global environment!

Another option is to take advantage of pipes to filter down and go directly into a plot.

Filtering out Penguins

Before, you might remember that some of our penguins weren’t identified by sex, leaving them with “NA” as their entry on that variable. Therefore, plotting by sex can be messy without first removing those NA values.

ggplot(data = penguins, aes(x = bill_length_mm, y = sex, fill = sex)) +
  geom_boxplot()

A pipe instead

Let’s build a pipe to filter out this category, but then add a plot afterwards. Notice that:

  • We created a pipe and used filter to use all categories of sex except “NA”
  • We then piped directly into a ggplot
  • The ggplot is exactly what we’d normally do, except that we don’t add the data argument–it’s already identified at the beginning of our pipe.
penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = sex, fill = sex)) +
  geom_boxplot()

Clustered boxplots

Let’s put together everything we’ve seen so far. Let’s compare the bill length of penguins across species and across sex (with “NA” removed)

  • We will again start with a pipe to remove the penguins that have NA listed for sex.
  • Then, let’s build clustered boxplots where species is represented on the y axis, bill_length_mm on the x axis, and sex as the fill color.
penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
  geom_boxplot()

Add some features

Now, let’s add a few more features

  • Adjust the title and labels
  • Format the title and labels
  • Add a nice background
  • Add some colors.
penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
  geom_boxplot() +
  theme_bw() +
  scale_fill_manual(values = c("lightpink", "steelblue1")) +
  labs(title = "Bill Length across Species",
       x = "Bill Length (mm)", 
       y = "Species", 
       fill = "Sex") +
  scale_x_continuous(breaks = seq(30,60,5)) +
  theme(axis.text = element_text(color = "black", size = 10),
        axis.title = element_text(size = 11, face = "bold"),
        legend.title = element_text(size = 11, face = "bold"),
        plot.title = element_text(size = 14, face = "bold", hjust = 0.5))

Acknowledgment

This tutorial was built by Dr. Kelly Findley from UIUC Statistics. I hope this experience was helpful for you!

If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/

Overlaying Geometries