Overlaying Geometries

Tutorial Introduction

Tutorial Goals

This tutorial will show a lot of advanced features for ggplot. The primary purpose of this tutorial is to make you aware of possibilities! If some of the code seems complicated, that's ok.

In this tutorial, you should:

  • Learn how to combine multiple geometries in one plot
  • Learn basic features (like changing width, or nudging) to help display multiple geoms cleanly
  • Be aware of more customization features, and a reminder that you can web search anything with ggplot!

What is a Geometry again?

In ggplot2, we think of different plot representations as "geometries." We shorten this to "geom" in the actual code. This is how we choose to represent data visually.

Some examples of geoms we have seen include points, boxplots, violinplots, density curves, and histograms (though there are many more!).

Why Multiple Geoms?

Different geoms can share different insights about the data. Some show the detail of all data points, or the distribution in general. Some show only summaries of the data. Combining these two types of geoms can often lead to more powerful visuals that communicate much more information at once. For this reason, boxplots (or summarizing marks in general) may be commonly combined with distributional geoms.

One nice feature about visualizing in R is that it is easy to plot multiple geometries in the same plot. We have full customization power about what we mix and match, and how we lay it out.

While "point and click" software like Excel has basic graphing capabilities, you are limited by what options are presented. When coding, you can literally code anything you want if you have a vision of what you want and the know-how (or google searching skills) to do it!

Nesting and Nudging

This tutorial, we will learn how to nest or nudge geometries.

With nesting, the goal is to put the biggest/widest geom down first, and then include narrower (and often, more transparent) geoms on top.

This is in contrast to nudging--where we lay multiple geoms side by side by nudging one to the left/right or up/down.

Nesting Geoms

Revisiting the Penguins

Let's use the palmerpenguins data for this demonstration. As a reminder, the penguins dataset from this package lists various measurements from over 300 penguins across 3 species.

library(palmerpenguins)
penguins

Boxplots inside violinplots

Let's look at bill length across species again. Violinplots are nice representations to express the distribution of a numeric variable. However, they don't clearly show summary information, like where the mean or median is. Let's try nesting a boxplot inside the violinplots.

I'll also add some nice, generic features that we've seen before.

ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species)) +
  geom_violin() +
  geom_boxplot() +
  theme_bw()

Playing with Width and Transparency

A couple things here--the shapes look awkward, especially with the matching colors. We can play with two things here to make it look much nicer!

  • Make the boxplots narrower so that they fit inside
  • Adjust the transparency of the violins to be a different shade (even if it's the same color argument).

I'm also going to remove the legend (something we saw in a previous tutorial) since the color legend is redundant with the x axis labels.

ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species)) +
  geom_violin(alpha = 0.4) +
  geom_boxplot(width = 0.3) +
  theme_bw() +
  theme(legend.position = "none")

These immediately look much more aesthetically pleasing!

The choice of how skinny to make the boxplots does depend on the distribution. In many cases where the distribution is narrower/more skewed, you might need a much narrower boxplot to nest inside!

More Customization

Let's do a few more things we've learned before: labels, a more frequent scaling of the y axis.

Let's also experiment with some new choices--we can make the boxplot lines a bit thicker to make it easier to compare. We could also choose to color the boxplot outlines, rather than the fill.

Then if you want to fill in all boxplots with a block color, override fill for the boxplots only.

ggplot(data = penguins, aes(x = species, y = bill_length_mm, fill = species, color = species)) +
  geom_violin(alpha = 0.4) +
  geom_boxplot(width = 0.3, fill = "white", size = 0.8) +
  theme_bw() +
  scale_y_continuous(breaks = seq(30,60,3)) +
  labs(title = "Bill Length by Species", x = "Species", y = "Bill Length (mm)") +
  theme(legend.position = "none")

Try on your own

Can you make violinplots with nested boxplots to compare flipper length (flipper_length_mm) across the three islands (island) of our penguins data?

Challenge yourself to make them look nice using any features you saw in the previous examples. Feel free to choose custom colors as well.

Hint: You will want to make much narrower boxplots. One possible solution provided via the hints, but you can certainly do things differently!

ggplot(data = penguins, aes(x = _________, y = ______, fill = ______)) +
  _______
ggplot(data = penguins, aes(x = island, y = flipper_length_mm, fill = island)) +
  geom_violin(_________) +
  geom_boxplot(__________)
ggplot(data = penguins, aes(x = island, y = flipper_length_mm, fill = island)) +
  geom_violin(alpha = 0.3) +
  geom_boxplot(width = ____, size = ____) +
  ___________
ggplot(data = penguins, aes(x = island, y = flipper_length_mm, fill = island, color = island)) +
  geom_violin(alpha = 0.3) +
  geom_boxplot(width = 0.06, size = 0.8, fill = "white") +
  theme_bw() +
  labs(title = "Flipper Length across Islands", x = "Island", y = "Flipper Length (mm)") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("orchid", "seagreen", "gold3")) +
  scale_color_manual(values = c("orchid", "seagreen", "gold3"))

Nudging Geometries

An Alternative

Another way to combine multiple geoms is by nudging one alongside another. This might be helpful in cases where the data is more skewed and it's difficult to embed representations.

Comparing Nesting and Nudging

Let's look at a new example involving the anaesthetic data from the faraway package.

library(faraway)
anaesthetic

Let's start by plotting the data as a strip chart, and plotting a boxplot underneath. I'm going to keep the boxplot fill white so that the points can be seen.

ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
  geom_boxplot() +
  geom_jitter(width = 0.1) +
  theme_classic()

This plot functionally works, and we could certainly clean it up to make it look nicer. However, the lines naturally interfere with the points. Nudging the boxplot over might look cleaner.

Nudging

We can do that with position = position_nudge(x = ..., y = ...) where we identify what distance in the x or y direction we wish to move this. Let's adjust it slightly to the right (about x = 0.3).

I will additionally add thicker lines with size and narrow the width with width.

ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp)) +
  geom_boxplot(width = 0.15, size = 1, position = position_nudge(x = 0.25)) +
  geom_jitter(width = 0.1) +
  theme_classic()

Or if you prefer, we could fill the boxplots with color. Be sure to color the boxplot borders black if choosing this option, since otherwise, the color = tgrp would make the boxplot quartiles blend in with the fill color.

ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp, fill = tgrp)) +
  geom_boxplot(color = "black", width = 0.15, size = 1, position = position_nudge(x = 0.25)) +
  geom_jitter(width = 0.1) +
  theme_classic() +
  theme(legend.position = "none")

Cleaning up the Outliers

In this particular representaiton, notice that boxplot report outlier values. However, this is redundant with the points represented on the left. We can add an argument to ask the boxplots not to show outliers: outlier.shape = NA

Since we have so many arguments in geom_boxplot, I will go ahead and push each argument to a new line. Note that we don't need to add a + after these lines since they are still inside a function. Only when I start a new function do I need a +

ggplot(data = anaesthetic, aes(x = tgrp, y = breath, color = tgrp, fill = tgrp)) +
  geom_boxplot(color = "black", 
               width = 0.15, 
               size = 1, 
               position = position_nudge(x = 0.25),
               outlier.shape = NA) +
  geom_jitter(width = 0.1) +
  theme_classic() +
  theme(legend.position = "none")

Boxplots with Ridge plots

Ridge plots can also look nice when combined with boxplots. Let's plot the same data, but let's use the geom_density_ridges option from the ggridges package.

library(ggridges)

ggplot(data = anaesthetic, aes(x = breath, y = tgrp, color = tgrp, fill = tgrp)) +
  geom_density_ridges() +
  geom_boxplot(color = "black", 
               width = 0.15, 
               size = 1,
               outlier.shape = NA) +
  theme_classic() +
  theme(legend.position = "none")

Making it look nicer

Ridge plots can be difficult to work with. They need a lot of customization to really look nice. Here is a summary of what I changed to improve this visualization:

  • Add some transparency with alpha--this allows the boxplots to pop out more in contrast to the ridge plots.
  • Add a black border color to the ridges
  • Lower the scale so that they don't overlap above. A scale of 1 would have each ridge plot meet the one above--so going a little lower than 1 is usually ideal.
  • Adjust the scale frequency
  • Limit it to start at 0 since it can't be below 0, and the ridge plot will be misleading otherwise.
  • The last function just pushes down the entire plot to hug the x axis more. Otherwise, there is a lot of whitespace at the bottom.
library(ggridges)

ggplot(data = anaesthetic, aes(x = breath, y = tgrp, color = tgrp, fill = tgrp)) +
  geom_density_ridges(alpha = 0.5, 
                      color = "black",
                      scale = 0.8) +
  geom_boxplot(color = "black", 
               width = 0.15, 
               size = 1,
               outlier.shape = NA) +
  theme_classic() +
  theme(legend.position = "none") +
  scale_x_continuous(breaks = seq(0,32,2),
                     limits = c(0, 24)) +
  scale_y_discrete(expand = expansion(add = c(0.2, 0.7)))

When to Nudge and when to Nest?

Much of this choice is up to you and what looks most aesthetically pleasing! I find when there aren't that many data points, it's nice to represent them all. When there are a lot, the data points can begin to look overwhelming to plot.

The penguins data is probably not an overwhelming amount of data points, but as we get past that amount, it may look cleaner to use other geoms (like violion plots or denstiy curves) to represent the distribution.

Give it a try!

Try a strip chart with nudged boxplots for the penguins data we saw earlier.

For a challenge, try changing the orientation from the example we saw earlier--put bill_length_mm on the y axis and species on the x axis.

Nudge the boxplots to be either above or below the points--experiment with which you like best!

Add features that you think make it look nice--custom colors if you wish! The solution again presents one possible representation, though you may certainly make different choices!

ggplot(data = penguins, aes(x = ____, y = ____, fill = ____, color = ____)) +
  _______
ggplot(data = penguins, aes(x = bill_length_mm, y = species, fill = species, color = species)) +
  geom_jitter(height = 0.1) +
  geom_boxplot(______) +
  _______
ggplot(data = penguins, aes(x = bill_length_mm, y = species, fill = species, color = species)) +
  geom_jitter(height = 0.1) +
  geom_boxplot(color = "black",
               width = ____, 
               size = ___, 
               position = ______,
               outlier.shape = ___) +
  _______
ggplot(data = penguins, aes(x = bill_length_mm, y = species, fill = species, color = species)) +
  geom_jitter(height = 0.1) +
  geom_boxplot(color = "black",
               width = 0.15, 
               size = 1, 
               position = position_nudge(y = 0.2),
               outlier.shape = NA) +
  theme_bw() +
  labs(title = "Bill Length across Species", x = "Bill Length (mm)", y = "Species") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("orchid2", "seagreen", "gold2")) +
  scale_color_manual(values = c("orchid2", "seagreen", "gold2")) +
  scale_x_continuous(breaks = seq(30,60,2))

Plot Codes inside Pipes

Saving a step

This last segment has a slightly different purpose than the previous sections. Up until this point, we have created and saved subsets to adjust what data gets plotted in our plots. This works, but requires saving extra elements to your global environment. Doing this several times can really clutter your global environment.

Another option is to take advantage of pipes to filter down and go directly into a plot. Nothing needs to be saved to a name first--the plot just follows the filtering argument!

Filtering out Penguins

Before, you might remember that some of our penguins weren't identified by sex, leaving them with "NA" as their entry on that variable. Therefore, plotting by sex can be messy without first removing those NA values.

Let's build a pipe to filter out this category, but then add a plot afterwards.

penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = sex, fill = sex)) +
  geom_boxplot() +
  theme_bw()

Note that after adding a pipe, we would write our code exactly as is EXCEPT that we don't make a data argument here. That's because the data argument is made at the beginning of the pipe.

Clustered boxplots revisited

Let's put together everything we've seen so far. Let's compare the bill length of penguins across species and across sex.

First, let's build a pipe to remove the penguins that have NA listed for sex.

Then, let's build clustered boxplots where species is represented on the y axis, bill_length_mm on the x axis, and sex as the fill color.

penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
  geom_boxplot() +
  theme_bw() +
  scale_fill_manual(values = c("lightpink", "steelblue1"))

Clustered boxplots and ridgeplots

Let's continue with this idea, but let's also add ridgeplots overhead. We can do this by adding density ridges, along with some basic options we've seen before.

We want to be sure to narrow the boxplots.

Something new we want to add is to remove the legend for just the

penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
  geom_density_ridges(alpha = 0.4,
                      scale = 0.8,
                      color = "black") +
  geom_boxplot(width = 0.15) +
  theme_classic()

Making it look nice!

In the following plot, I'm going to demonstrate many other features. You do not have to know how to do all of these things (though many of these should be familiar by now!). The point here is show you the possibilities of what you can do!

Remember that web searching is your friend--type ggplot, followed by your question or some key words, and you can likely figure out how to do anything.

penguins %>%
  filter(sex != "NA") %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = sex)) +
  geom_density_ridges(alpha = 0.4,
                      scale = 0.9,
                      color = "black") +
  geom_boxplot(width = 0.15,
               outlier.shape = NA,
               size = 0.7) +
  theme_classic() +
  scale_fill_manual(values = c("lightpink", "steelblue1")) +
  labs(title = "Bill Length across Species",
       caption = "Data based on 333 penguins from the Palmer Archipelago",
       x = "Bill Length (mm)", 
       y = "Species", 
       fill = "Sex") +
  scale_x_continuous(breaks = seq(30,60,2),
                     limits = c(30,60)) +
  scale_y_discrete(expand = expansion(add = c(0.2, 0.7))) +
  #this last part makes the titles and labels bigger and/or bolder. Adjust to see how it affects things!
  theme(axis.text = element_text(color = "black", size = 10),
        axis.title = element_text(size = 12, face = "bold"),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 16, face = "bold"))

Acknowledgment

This tutorial was built by Dr. Kelly Findley from UIUC Statistics. I hope this experience was helpful for you!