Customizing with ggplot2

Colors in R

Color as a Representation

Color can do a lot for data visuals. Not only can it brighten up a dreary plot, it can also stand in as one more layer of representation in your plot. In this section, we'll explore several ways to add and customize color options in R.

What Colors are Available??

First, keep in mind that there are thousands of color options in R.

For a complete listing of colors, I recommend the "Colors in R" document. Which can be found with an easy web search. Also linked here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

If you're looking for a color palette, this can also be found with a web search. Here is a link with color palettes on the last page, as well as individual colors on one page: https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf

One thing you might want to notice is that colors have descriptive names, and they also have unique IDs. Either can be used to identify and use a color in a plot.

Default Colors

Let's start with a plot. The following code creates boxplots to compare the petal lengths of three species of iris plants.

Color is used here to simply fill each boxplot with a unique shade. R defaults to a generic color palette, but we can actually create a manual change!

ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar")

Manual change to fill color

By adding an additional function: scale_fill_manual, I can request R to use a specific set of fill colors.

I just picked some colors from the link provided earlier, but we could fill in any color names we wish!

ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_fill_manual(values = c("khaki2","plum3","springgreen4"))

Manual change to border color

We could do the same thing with border colors (or point colors) using scale_color_manual.

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_color_manual(values = c("khaki2","plum3","springgreen4"))

Note that for boxplots, this will work as a border color. I'm not sure how well it works by itself, but I like outline colored boxplots when combined with other geometries!

Color Palettes

Choosing a Color Palette

In this case, 3 colors is not a lot to list, but it might be easier to use a color palette. In which case, RColorBrewer is a popular way to go. The link listed earlier lists popular palettes from this package.

Note, this package is embedded within ggplot2, so if you load ggplot2, you will have loaded RColorBrewer already!

Just update the argument name in this case to "scale_fill_brewer" and set a palette rather than values. I'm going to choose the "YlGn" palette, just because it feels very plants appropriate to me, but you should look at the link provided earlier and try one that you like!

ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_fill_brewer(palette = "YlGn")

Palettes for Borders/Points

And again...if we wish to change border or point colors, we would apply similar commands, but with "color" replacing "fill" in each case.

We can see quickly the difference when we apply this to the iris boxplot.

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_color_brewer(palette = "Set2")

A Point Example for "color"

Let's try points with a strip chart.

Notice we now use the geom_jitter geometry. I'll stick with this color palette, but feel free to try a different one from the link listed earlier!

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2")

Color blind palette

Something to be aware of are the use of color blind palettes. Since some color blind viewers may struggle to differentiate certain colors, using a color blind palette will likely present color options that are maximally distinguishable.

One in particular can be accessed through the package ggthemes.

library(ggthemes)

ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_fill_colorblind()

Scale Distiller

Lastly, colors and color palettes can also be enlisted when working with numeric variables. They just take on a different feeling since they will now be a spectrum rather than discrete color choices.

Use scale_color_distiller or scale_fill_distiller to set a color scheme in these cases. These are further options linked to RColorBrewer.

The following plot links the color to the Petal Width (a variable not already represented), though we could also link it to Petal Length (currently on the y axis) if we just want that represented in a second way as well.

Experiment with different variables or different palettes!

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Petal.Width)) +
  geom_jitter(width = 0.1) +
  scale_color_distiller(palette = "Reds")

To no surprise, we notice that irises with larger petal lengths also tend to have larger petal widths.

Distiller with Fill color

I find ridge plots to be really cool examples to try distiller. You can use it to represent the x axis variable and allow for a more visually appealing look.

One weird thing about doing this with ridgeplots--instead of writing the variable name in fill, you would write stat(x). It's a strange technical setup for ridgeplots, so it's just something you would need to remember (or look up!) when you need it. :)

Here is one we've seen before with the diamonds data, but now in color.

library(ggridges)

ggplot(data = iris, aes(x = Sepal.Length, y = Species, fill = stat(x))) +
  geom_density_ridges_gradient() +
  scale_fill_distiller(palette = "Spectral", name = "Sepal Length")

Pre-made Plot Themes

What is a Plot Theme?

Changing the style of your plot can also just make it look more professional. Plot themes most obviously affect the background, in addition to other small formatting differences.

Built-in Plot Themes

There are a collection of built in plot themes for ggplot2 that you can enact with an additional line of code.

For example, the plot theme below, theme_classic(), creates a blank canvas rather than the grey default grid. This can be nice if you want to make your representation stand out more, and there is no need for grid lines.

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_classic()

Where to find more?

For more themes you can apply, check out: https://www.datanovia.com/en/blog/ggplot-themes-gallery/

Two other favorites of mine are theme_bw() and theme_dark()

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw()

Custom Theme options

Using the theme function

In this clsss, we will do very little additional customization, but there are lots of possibilities with the theme function. For example:

  • Changing the position of the legend (or removing the legend)
  • Changing font size and color
  • Customizing the background color and gridlines

For a complete list, check out the documentation! https://ggplot2.tidyverse.org/reference/theme.html

Let's look at a couple examples...

Changing labels

If you want to customize anything about the title, use the plot.title argument.

By default, the title will be left-aligned, with a particular font and look. Here is an example of a few changes.

Feel free to adjust some of those values to see what changes!

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw() +
  labs(title = "Iris Species by Petal Length",
       x = "Species",
       y = "Petal Length") +
  theme(plot.title = element_text(size = 12, hjust = 0.5, face = "bold"))

You can make changes with the axis labels using the axis.title.x and axis.title.y arguments.

Legend changes

Changing the legend title is maybe not what you think. If the legend is color-based, you'll need to name it based on the argument you made in the aes line. For example, in this last plot, I would add a color = argument in labs().

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw() +
  labs(title = "Iris Species by Petal Length",
       x = "Species",
       y = "Petal Length",
       color = "Look, I changed it!")

Also, if you wanted to hide the legend, that's an easy option in theme. Or if you just wanted to move it somewhere else, you could also input "left", "right", "top", or "bottom". Feel free to try one of those here!

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw() +
  labs(title = "Iris Species by Petal Length",
       x = "Species",
       y = "Petal Length") +
  theme(legend.position = "none")

Time to Practice!

Let's use the mammalsleep dataset from the faraway package.

Let's create a scatterplot that compares the body weight (x axis) to the brain weight of mammals in this dataset. We will also color the data by predation status (a discrete variable with 5 categories).

Your challenge is to...

  • Use a color palette, or manual colors
  • Provide a title and appropriate axes labels
  • Use a plot theme (I suggest a pre-made option)
  • Adjust the title to be centered

There is not an objectively correct result, but the hints and solution reveal a possible outcome for reference!

Note, keep that first code, as that will allow us to treat predation as 5 unique categories rather than R assume it is a numeric/continuous scale.

library(faraway)
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = body, y = brain, color = predation)) +
  ...
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = body, y = brain, color = predation)) +
  geom_point() +
  labs(title = _________, x = ________, y = ____________) +
  ...
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = body, y = brain, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme(plot.title = element_text(___________)) +
  theme_bw() +
  scale_color_brewer(____________)
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = sleep, y = lifespan, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme_bw() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_color_brewer(palette = "BuPu")

One More thing on themes...

The use of pre-made themes, like theme_classic(), put them in before enacting customizations with the theme function. Otherwise, it will overwrite the use of certain selections you might make in the theme()

Scale Functions

Tick Mark Frequency

Scale functions can let you update the frequency of tick marks on your plots. This can be helpful when the default frequency is not frequent enough (or possibly too frequent) to be helpful.

Let' start with y axis of the plot we've already seen. The original plot only labels in increments of 2, but let's try increments of 0.5. Use the scale_y_continuous() function to make adjustments. The specific argument we'll use is breaks which lets us set a vector of values to display.

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw() +
  scale_y_continuous(breaks = seq(0,8,1))

Limits

Note that even though we specified the range from 0 to 8, it only includes those values in the original range of data. If you wanted to actually expand (or contract) the range that displays, you can use the limits argument. It simply takes a vector of 2 values, a minimum and maximum range value.

ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
  geom_jitter(width = 0.05, alpha = 0.8) +
  scale_color_brewer(palette = "Set2") +
  theme_bw() +
  scale_y_continuous(breaks = seq(0,8,1),
                     limits = c(0,8))

Not sure this makes anything better here...but just want to demonstrate it's something you can do if helpful!

Scaling x

You can do the same things with the x axis (should it be numeric). Let's flip this plot around (and change it to boxplots, because why not). Let's make the same changes.

ggplot(data = iris, aes(y = Species, x = Petal.Length, fill = Species)) +
  geom_boxplot() +
  stat_boxplot(geom = "errorbar") +
  scale_fill_brewer(palette = "Set2") +
  theme_bw() +
  scale_x_continuous(breaks = seq(0,8,1))

When it isn't continuous...

Most of the time when adjusting a numeric axis, it will be a numeric (continuous) variable. Though if the variable is a factor variable and has distinct categories, you may need to use scale_y_discrete or scale_x_discrete. They function mostly identically to the continuous functions.

Change the axis!

Let's revisit the mammals data again. Change the x axis to have tick marks by 2's, and the y axis to have tick marks by 10's.

library(faraway)
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = sleep, y = lifespan, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme_bw() +
  scale_color_brewer(palette = "Set1")
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = sleep, y = lifespan, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme_bw() +
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(breaks = _________) +
  scale_y_continuous(__________________)
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = sleep, y = lifespan, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme_bw() +
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(breaks = seq(______)) +
  scale_y_continuous(breaks = seq(______))
mammalsleep$predation = as.factor(mammalsleep$predation)

ggplot(data = mammalsleep, aes(x = sleep, y = lifespan, color = predation)) +
  geom_point() +
  labs(title = "Body weight vs. Brain weight of Mammals", x = "Body Weight", y = "Brain Weight") +
  theme_bw() +
  scale_color_brewer(palette = "Set1") +
  scale_x_continuous(breaks = seq(0,20,2)) +
  scale_y_continuous(breaks = seq(0,100,10))

Acknowledgment

This tutorial was created by Dr. Kelly Findley of the UIUC Statistics department. I hope this experience was helpful for you!