Introduction
Goals
In this tutorial, we will cover several additional functions you can use to customize and enhance your visualizations with ggplot. This includes:
- Choosing custom colors
- Choosing color palettes (a defined group of colors that go well together!)
- Adding a pre-built plot-theme to change the background paneling or axis style
- Using
theme
to customize specific features, like the placement and font of the title, axis labels, and legends - Using scale functions to adjust axis tick marks.
- Re-ordering categories to showcase in a custom order
A great resource
If you’re interested in looking at more cool things you can do with R beyond what you’re learning here, check out the R Graph Gallery! Easy to web search, but also linked here: https://r-graph-gallery.com/index.html
Colors in R
Color as a Representation
Color can do a lot for data visuals. Not only can it brighten up a dreary plot, it can also stand in as one more layer of representation in your plot. In this section, we’ll explore several ways to add and customize color options in R
.
What Colors are Available??
For a complete listing of colors, I recommend the “Colors in R” document. Which can be found with an easy web search. Also linked here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
One thing you might want to notice is that colors have descriptive names, and they also have unique IDs. Either can be used to identify and use a color in a plot.
Default Colors
The following code creates boxplots to compare the petal lengths of three species of iris plants.
Color is used here to simply fill each boxplot with a unique shade. R
defaults to a generic color palette, but we could change these!
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar")
Manual change to fill/color
If you wish to manually select specific colors to include in your plot, you will likely want one of these two functions:
scale_fill_manual
: apply manually chosen colors to apply to a fill representationscale_color_manual
: apply manually chosen colors to apply to a color (border, point) representation
An Example with fill
Since our mapping is to fill
, let’s use scale_fill_manual
Notice that we only need to write one argument:
values
: a vector of color names to apply as needed.
Since we have 3 boxes to fill, we need 3 colors.
I just picked some colors from the link provided earlier, but we could fill in any color names we wish.
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_fill_manual(values = c("khaki2","plum3","springgreen4"))
Manual change to border color
We could do the same thing with border colors (or point colors) using scale_color_manual
.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_color_manual(values = c("khaki2","plum3","springgreen4"))
Note that for boxplots, this will work as a border color. Maybe you like the border coloring better than the fill coloring!
Color Palettes
Choosing a Color Palette
If you don’t feel like choosing colors manually, you might like to use a pre-made palette.
I recommend the palettes loaded in the RColorBrewer
package (which will be included already with tidyverse
). You can find several examples here:
New function names
If using a palette, here are four functions you might need:
scale_fill_brewer
: applying a palette to a fill mapping that takes categorical entriesscale_color_brewer
: applying a palette to a color mapping (borders, points) that takes categorical entriesscale_fill_distiller
: applying a palette to a fill mapping that takes a numeric scalescale_color_distiller
: applying a palette to a color mapping (borders, points) that takes a numeric scale
Example with scale_fill_brewer
In this example, Species
is a categorical variable, so I need to use brewer. Furthermore, I’m mapping it as a fill color.
We need scale_fill_brewer
!
I’m going to choose the YlGn
palette, but feel to try another palette from that link.
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_fill_brewer(palette = "YlGn")
Palettes for Borders/Points
Still using Species
, but now let’s change the mapping to be a color. For a boxplot, that will change the box border colors.
So let’s update the function to scale_color_brewer
now.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_color_brewer(palette = "Set2")
A Point Example for “color”
Let’s try points with a jitter plot.
Notice that with points, a color mapping will show up with color
rather than fill
.
We’re still working with a categorical variable in Species
.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2")
Scale Distiller
Use scale_color_distiller
or scale_fill_distiller
to set a color scheme for numeric variables as a spectrum effect.
The following plot applies color to a 3rd numeric variable not already represented: Petal.Width
. So now, each point will be colored on a scale to represent that plant’s petal width reading.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Petal.Width)) +
geom_jitter(width = 0.05) +
scale_color_distiller(palette = "Reds")
To no surprise, we notice that irises with larger petal lengths also tend to have larger petal widths.
Pre-made Plot Themes
What is a Plot Theme?
Changing the style of your plot can also just make it look more professional. Plot themes most obviously affect the background, in addition to other small formatting differences.
Built-in Plot Themes: theme_classic
There are a collection of built in plot themes for ggplot2
that you can enact with an additional line of code.
For example, the plot theme below, theme_classic()
, creates a blank canvas rather than the grey default grid. It’s a nice “gridless” option.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_classic() +
labs(title = "Petal Length of each Iris Species")
theme_minimal
I also like theme_minimal
if you want a basic grid, but no axis lines.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_minimal() +
labs(title = "Petal Length of each Iris Species")
Where to find more?
For more themes you can apply, check out: https://www.datanovia.com/en/blog/ggplot-themes-gallery/
Two other favorites of mine are theme_bw()
and theme_dark()
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_bw() +
labs(title = "Petal Length of each Iris Species")
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_dark() +
labs(title = "Petal Length of each Iris Species")
Customizing Labels
Using the theme function
The theme
function allows for a lot of specific customizations. For this class, we’ll focus on changes to:
- plot titles
- axis titles
- axis labels
- legend titles
Changing labels
If you want to customize the title, use the plot.title
argument.
By default, the title will be left-aligned, with a particular font and look. Let’s customize three things
size
will adjust the font sizehjust
will adjust the alignment. 0 is left-aligned and 1 is right-aligned, while 0.5 is centeredface
allows us to choose whether we want tobold
oritalic
the font style.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"))
Axis titles
Likewise, we can make similar changes with the axis titles too!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"))
Axis Labels
…and changes to the axis scales. Personally, I like to color these black since the default grey color makes them hard to read. And also make the font a bit bigger.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 11, color = "black"))
Legend title
If you wish to change the legend title, you can actually do that in labs! But note that it is by identifying the aesthetic it is mapped to.
- If the legend is based on color, then change the title with
color =
- If the legend is based on fill color, then change the title with
fill =
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length",
color = "Look, I changed it!")
Hide the legend
If you add color to differentiate groups that are already differentiated by the x-axis, it might be sensible to just hide the legend. You can do that as an argument in theme
legend.position = "none"
will hide the legend.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(legend.position = "none")
Moving the legend
Or if you just want to place it somewhere else, you can enter “left” “right” “top” or “bottom” to relocate it!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(legend.position = "bottom")
See documentation for more
For more on what you can customize with theme
, check out the documentation by searching “ggplot2 theme” in a web browser, or going to this link. https://ggplot2.tidyverse.org/reference/theme.html
Try one!
Let’s use the msleep
dataset
Let’s create a scatterplot that compares the sleep_rem
(x axis) to the sleep_total
of mammals in this dataset. We will also color the data by vore
classification
Your challenge is to…
- In the
geom_point()
line, addsize = 2
as an argument to make the points bigger and esaier to see! - Apply the “Set1” color palette from RColorBrewer
- Add the title “Mammal Total Sleep vs. REM Sleep by Diet”
- Apply the
theme_bw()
pre-built plot theme. - Use the
theme
function to adjust the title to be centered and bolded
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
...
...
...
...
...
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(_________) +
scale_color_____(___________) +
labs(title = _________) +
theme_bw() +
theme(__________)
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
scale_color_brewer(palette = ______) +
labs(title = "Mammal Total Sleep vs. REM Sleep by Diet") +
theme_bw() +
theme(plot.title = element_text(_____________))
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
scale_color_brewer(palette = "Set1") +
labs(title = "Mammal Total Sleep vs. REM Sleep by Diet") +
theme_bw() +
theme(plot.title = element_text(face = "bold", hjust = 0.5))
One More thing on themes…
The use of pre-made themes, like theme_bw()
or theme_classic()
, put them in before enacting customizations with the theme
function.
Why? Because the pre-built theme will overwrite any customization options you add with theme
. So your tinkering at the customization should always come after!
Scale Functions
Tick Mark Frequency
Scale functions can let you update the numeric frequency or numeric limits of tick marks on your plots.
- Use
scale_x_continuous()
to adjust scaling on the x axis - Use
scale_y_continuous()
to adjust scaling on the y axis
Breaks
If we want to change the frequency of the tick marks, we would use the breaks
argument. Typically, we’ll set breaks
equal to a sequence.
- As a reminder,
seq()
takes three arguments, we we start, we we end, and the frequency we want.
In the plot we were creating, the y axis defaulted to increments of 2, but let’s try increments of 0.5.
- Since the range spans from about 0 to 8, we’ll
breaks = seq(0,8,0.5)
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,0.5))
But the axis didn’t start at 0. Why not?
Note that even though we specified the range from 0 to 8, the actual range covered will just go as far as our data falls (about 1 to 8).
If you want to adjust the limits of the axis, use a limits
argument!
Limits
limits
: It simply takes a vector of 2 values, a minimum and maximum range value, to determine the full scale of that axis.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(limits = c(0,8))
Limits and Scaling
If you want to do both at the same time, add both arguments separated by a comma!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,0.5), limits = c(0,8))
Scaling done wrong
Notice here that I’ll set the limits from 0 to 12, but the tick mark frequency only from 0 to 8. Notice how the tick marks will just stop outside the range defined.
If you use both, make sure you’re careful to choose a scaling that covers the full range (unless you’re using this effect on purpose)!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,1), limits = c(0,12))
Scaling x
You can do the same things with the x axis (should it be numeric). Let’s flip this plot around (and change it to boxplots, because why not). Let’s make the same changes.
ggplot(data = iris, aes(y = Species, x = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_x_continuous(breaks = seq(0,8,1))
Change the axis!
Let’s revisit the msleep
data again. Add scale functions such that:
- the x axis has tick marks by intervals of 0.5
- the y axis has tick marks by intervals of 1
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1")
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = _________) +
scale_y_continuous(__________________)
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = seq(______)) +
scale_y_continuous(breaks = seq(______))
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = seq(0,7,0.5)) +
scale_y_continuous(breaks = seq(0,21,1))
Re-ordering Categories
Alphabetical is default
By default, R
will order non-numeric variables in alphabetical order.
For example, the Species
variable will be listed in the alphabetical order
- Setosa, Versicolor, Virginica
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
theme_bw() +
scale_fill_brewer(palette = "BuPu") +
theme(axis.title = element_text(size = 14),
axis.text = element_text(color = "black", size = 12))
Re-ordering with factor
If you want your variable to have a custom ordering, you can use the factor
function to redefine its structure. This function takes two arguments
- a vector (if necessary, linked through the data frame it is embedded in) that you wish to restructure
levels
, which will be your custom ordering of values
factor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
How to apply it
To apply it, we just need to assign this factor ordering back to the original variable in the data frame. Notice below how this factor structuring is set equal to the variable Species
through the data frame iris
.
Notice that this code doesn’t output anything. It is an internal restructuring.
iris$Species = factor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
Now lets try the plot
If we now try the plot, you’ll see how the levels have been restructured!
iris$Species = factor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
theme_bw() +
scale_fill_brewer(palette = "BuPu") +
theme(axis.title = element_text(size = 14),
axis.text = element_text(color = "black", size = 12))
Another example
Consider if I have 20 students in different grade levels…
Class
…and I want to make a plot to compare them based on their class level.
ggplot(data = Class, aes(x = acad_level, y = height, color = acad_level)) +
geom_jitter(width = 0.08, size = 2) +
scale_y_continuous(limits = c(58, 76),
breaks = seq(58,76,2)) +
theme_minimal() +
labs(title = "Class Heights by Year",
x = "Academic Level",
y = "Height (in)",
color = "Academic Level") +
theme(plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(color = "black", size = 11),
axis.title = element_text(size = 12))
Re-order
But notice that alphabetically, my order of academic level is Freshman, Junior, Senior, Sophomore.
I’d like to get the order to be chronological: Freshman, Sophomore, Junior, Senior!
Think of my coding template as follows…what would I fill in at each blank?
_______$_______ = factor(______$______, levels = c(_______, _______, _______, _______))
Finished Example
Class$acad_level = factor(Class$acad_level, levels = c("Freshman", "Sophomore", "Junior", "Senior"))
ggplot(data = Class, aes(x = acad_level, y = height, color = acad_level)) +
geom_jitter(width = 0.08, size = 2) +
scale_y_continuous(limits = c(58, 76),
breaks = seq(58,76,2)) +
theme_minimal() +
labs(title = "Class Heights by Year",
x = "Academic Level",
y = "Height (in)",
color = "Academic Level") +
theme(plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(color = "black", size = 11),
axis.title = element_text(size = 12))
Be cArEfUl wItH cAsE!
Be super careful with these codes, as you’ll need every category name to be exactly as it appears in the data sheet, including CaSe SenSiTiVe. And make sure your data frame name matches what you have in your global environment!
Return Home
This tutorial was created by Kelly Findley. I hope this experience was helpful for you!
If you’d like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/