In this tutorial, we will cover several additional functions you can use to customize and enhance your visualizations with ggplot. This includes:
theme
to customize specific features, like the placement and font of the title, axis labels, and legendsIf you're interested in looking at more cool things you can do with R beyond what you're learning here, check out the R Graph Gallery! Easy to web search, but also linked here: https://r-graph-gallery.com/index.html
Color can do a lot for data visuals. Not only can it brighten up a dreary plot, it can also stand in as one more layer of representation in your plot. In this section, we'll explore several ways to add and customize color options in R
.
For a complete listing of colors, I recommend the "Colors in R" document. Which can be found with an easy web search. Also linked here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
One thing you might want to notice is that colors have descriptive names, and they also have unique IDs. Either can be used to identify and use a color in a plot.
The following code creates boxplots to compare the petal lengths of three species of iris plants.
Color is used here to simply fill each boxplot with a unique shade. R
defaults to a generic color palette, but we could change these!
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar")
If you wish to manually select specific colors to include in your plot, you will likely want one of these two functions:
scale_fill_manual
: apply manually chosen colors to apply to a fill representationscale_color_manual
: apply manually chosen colors to apply to a color (border, point) representationSince our mapping is to fill
, let's use scale_fill_manual
Notice that we only need to write one argument:
values
: a vector of color names to apply as needed.Since we have 3 boxes to fill, we need 3 colors.
I just picked some colors from the link provided earlier, but we could fill in any color names we wish.
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_fill_manual(values = c("khaki2","plum3","springgreen4"))
We could do the same thing with border colors (or point colors) using scale_color_manual
.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_color_manual(values = c("khaki2","plum3","springgreen4"))
Note that for boxplots, this will work as a border color. Maybe you like the border coloring better than the fill coloring!
If you don't feel like choosing colors manually, you might like to use a pre-made palette.
I recommend the palettes loaded in the RColorBrewer
package (which will be included already with tidyverse
). You can find several examples here:
If using a palette, here are four functions you might need:
scale_fill_brewer
: applying a palette to a fill mapping that takes categorical entriesscale_color_brewer
: applying a palette to a color mapping (borders, points) that takes categorical entriesscale_fill_distiller
: applying a palette to a fill mapping that takes a numeric scalescale_color_distiller
: applying a palette to a color mapping (borders, points) that takes a numeric scaleIn this example, Species
is a categorical variable, so I need to use brewer. Furthermore, I'm mapping it as a fill color.
We need scale_fill_brewer
!
I'm going to choose the YlGn
palette, but feel to try another palette from that link.
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_fill_brewer(palette = "YlGn")
Still using Species
, but now let's change the mapping to be a color. For a boxplot, that will change the box border colors.
So let's update the function to scale_color_brewer
now.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_color_brewer(palette = "Set2")
Let's try points with a jitter plot.
Notice that with points, a color mapping will show up with color
rather than fill
.
We're still working with a categorical variable in Species
.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2")
Use scale_color_distiller
or scale_fill_distiller
to set a color scheme for numeric variables as a spectrum effect.
The following plot applies color to a 3rd numeric variable not already represented: Petal.Width
. So now, each point will be colored on a scale to represent that plant's petal width reading.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Petal.Width)) +
geom_jitter(width = 0.05) +
scale_color_distiller(palette = "Reds")
To no surprise, we notice that irises with larger petal lengths also tend to have larger petal widths.
Changing the style of your plot can also just make it look more professional. Plot themes most obviously affect the background, in addition to other small formatting differences.
There are a collection of built in plot themes for ggplot2
that you can enact with an additional line of code.
For example, the plot theme below, theme_classic()
, creates a blank canvas rather than the grey default grid. It's a nice "gridless" option.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_classic() +
labs(title = "Petal Length of each Iris Species")
I also like theme_minimal
if you want a basic grid, but no axis lines.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_minimal() +
labs(title = "Petal Length of each Iris Species")
For more themes you can apply, check out: https://www.datanovia.com/en/blog/ggplot-themes-gallery/
Two other favorites of mine are theme_bw()
and theme_dark()
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_bw() +
labs(title = "Petal Length of each Iris Species")
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_color_brewer(palette = "Set2") +
theme_dark() +
labs(title = "Petal Length of each Iris Species")
The theme
function allows for a lot of specific customizations. For this class, we'll focus on changes to:
If you want to customize the title, use the plot.title
argument.
By default, the title will be left-aligned, with a particular font and look. Let's customize three things
size
will adjust the font sizehjust
will adjust the alignment. 0 is left-aligned and 1 is right-aligned, while 0.5 is centeredface
allows us to choose whether we want to bold
or italic
the font style.ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"))
Likewise, we can make similar changes with the axis titles too!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"))
...and changes to the axis scales. Personally, I like to color these black since the default grey color makes them hard to read. And also make the font a bit bigger.
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(plot.title = element_text(size = 14, hjust = 0.5, face = "bold"),
axis.title = element_text(size = 12, face = "bold"),
axis.text = element_text(size = 11, color = "black"))
If you wish to change the legend title, you can actually do that in labs! But note that it is by identifying the aesthetic it is mapped to.
color =
fill =
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length",
color = "Look, I changed it!")
If you add color to differentiate groups that are already differentiated by the x-axis, it might be sensible to just hide the legend. You can do that as an argument in theme
legend.position = "none"
will hide the legend.ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(legend.position = "none")
Or if you just want to place it somewhere else, you can enter "left" "right" "top" or "bottom" to relocate it!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
labs(title = "Iris Species by Petal Length",
x = "Species",
y = "Petal Length") +
theme(legend.position = "bottom")
For more on what you can customize with theme
, check out the documentation by searching "ggplot2 theme" in a web browser, or going to this link. https://ggplot2.tidyverse.org/reference/theme.html
Let's use the msleep
dataset
Let's create a scatterplot that compares the sleep_rem
(x axis) to the sleep_total
of mammals in this dataset. We will also color the data by vore
classification
Your challenge is to...
geom_point()
line, add size = 2
as an argument to make the points bigger and esaier to see!theme_bw()
pre-built plot theme.theme
function to adjust the title to be centered and boldedggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
...
...
...
...
...
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(_________) +
scale_color_____(___________) +
labs(title = _________) +
theme_bw() +
theme(__________)
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
scale_color_brewer(palette = ______) +
labs(title = "Mammal Total Sleep vs. REM Sleep by Diet") +
theme_bw() +
theme(plot.title = element_text(_____________))
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
scale_color_brewer(palette = "Set1") +
labs(title = "Mammal Total Sleep vs. REM Sleep by Diet") +
theme_bw() +
theme(plot.title = element_text(face = "bold", hjust = 0.5))
The use of pre-made themes, like theme_bw()
or theme_classic()
, put them in before enacting customizations with the theme
function.
Why? Because the pre-built theme will overwrite any customization options you add with theme
. So your tinkering at the customization should always come after!
Scale functions can let you update the numeric frequency or numeric limits of tick marks on your plots.
scale_x_continuous()
to adjust scaling on the x axisscale_y_continuous()
to adjust scaling on the y axisIf we want to change the frequency of the tick marks, we would use the breaks
argument. Typically, we'll set breaks
equal to a sequence.
seq()
takes three arguments, we we start, we we end, and the frequency we want.In the plot we were creating, the y axis defaulted to increments of 2, but let's try increments of 0.5.
breaks = seq(0,8,0.5)
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,0.5))
Note that even though we specified the range from 0 to 8, the actual range covered will just go as far as our data falls (about 1 to 8).
If you want to adjust the limits of the axis, use a limits
argument!
limits
: It simply takes a vector of 2 values, a minimum and maximum range value, to determine the full scale of that axis.ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(limits = c(0,8))
If you want to do both at the same time, add both arguments separated by a comma!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,0.5), limits = c(0,8))
Notice here that I'll set the limits from 0 to 12, but the tick mark frequency only from 0 to 8. Notice how the tick marks will just stop outside the range defined.
If you use both, make sure you're careful to choose a scaling that covers the full range (unless you're using this effect on purpose)!
ggplot(data = iris, aes(x = Species, y = Petal.Length, color = Species)) +
geom_jitter(width = 0.05) +
scale_y_continuous(breaks = seq(0,8,1), limits = c(0,12))
You can do the same things with the x axis (should it be numeric). Let's flip this plot around (and change it to boxplots, because why not). Let's make the same changes.
ggplot(data = iris, aes(y = Species, x = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
scale_x_continuous(breaks = seq(0,8,1))
Let's revisit the msleep
data again. Add scale functions such that:
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1")
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = _________) +
scale_y_continuous(__________________)
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = seq(______)) +
scale_y_continuous(breaks = seq(______))
ggplot(data = msleep, aes(x = sleep_rem, y = sleep_total, color = vore)) +
geom_point(size = 2) +
labs(title = "Sleep REM vs. Sleep Totals by Diet",
x = "REM sleep (hrs)",
y = "Sleep Total (hrs)") +
theme_bw() +
scale_color_brewer(palette = "Set1") +
scale_x_continuous(breaks = seq(0,7,0.5)) +
scale_y_continuous(breaks = seq(0,21,1))
By default, R
will order non-numeric variables in alphabetical order.
For example, the Species
variable will be listed in the alphabetical order
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
theme_bw() +
scale_fill_brewer(palette = "BuPu") +
theme(axis.title = element_text(size = 14),
axis.text = element_text(color = "black", size = 12))
If you want your variable to have a custom ordering, you can use the factor
function to redefine its structure. This function takes two arguments
levels
, which will be your custom ordering of valuesfactor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
To apply it, we just need to assign this factor ordering back to the original variable in the data frame. Notice below how this factor structuring is set equal to the variable Species
through the data frame iris
.
Notice that this code doesn't output anything. It is an internal restructuring.
iris$Species = factor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
If we now try the plot, you'll see how the levels have been restructured!
iris$Species = factor(iris$Species, levels = c("versicolor", "setosa", "virginica"))
ggplot(data = iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot() +
stat_boxplot(geom = "errorbar") +
theme_bw() +
scale_fill_brewer(palette = "BuPu") +
theme(axis.title = element_text(size = 14),
axis.text = element_text(color = "black", size = 12))
Consider if I have 20 students in different grade levels...
Class
...and I want to make a plot to compare them based on their class level.
ggplot(data = Class, aes(x = acad_level, y = height, color = acad_level)) +
geom_jitter(width = 0.08, size = 2) +
scale_y_continuous(limits = c(58, 76),
breaks = seq(58,76,2)) +
theme_minimal() +
labs(title = "Class Heights by Year",
x = "Academic Level",
y = "Height (in)",
color = "Academic Level") +
theme(plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(color = "black", size = 11),
axis.title = element_text(size = 12))
But notice that alphabetically, my order of academic level is Freshman, Junior, Senior, Sophomore.
I'd like to get the order to be chronological: Freshman, Sophomore, Junior, Senior!
Think of my coding template as follows...what would I fill in at each blank?
_______$_______ = factor(______$______, levels = c(_______, _______, _______, _______))
Class$acad_level = factor(Class$acad_level, levels = c("Freshman", "Sophomore", "Junior", "Senior"))
ggplot(data = Class, aes(x = acad_level, y = height, color = acad_level)) +
geom_jitter(width = 0.08, size = 2) +
scale_y_continuous(limits = c(58, 76),
breaks = seq(58,76,2)) +
theme_minimal() +
labs(title = "Class Heights by Year",
x = "Academic Level",
y = "Height (in)",
color = "Academic Level") +
theme(plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(color = "black", size = 11),
axis.title = element_text(size = 12))
Be super careful with these codes, as you'll need every category name to be exactly as it appears in the data sheet, including CaSe SenSiTiVe. And make sure your data frame name matches what you have in your global environment!
This tutorial was created by Kelly Findley. I hope this experience was helpful for you!
If you'd like to go back to the tutorial home page, click here: https://stat212-learnr.stat.illinois.edu/