Load packages
library(tidyverse)
library(forcats)
library(hrbrthemes)
What we want to achieve: barplot ggplot2-diagram where bars and legend labels are sorted
Say we would like to plot frequencies, and would like to use ggplot2
for that purpose. How can we get a decent graph? This post shows some ways.
Some data
data(diamonds)
A glimpse to the data
glimpse(diamonds)
#> Rows: 53,940
#> Columns: 10
#> $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0…
#> $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ve…
#> $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I…
#> $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1,…
#> $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 6…
#> $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 5…
#> $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 3…
#> $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4…
#> $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4…
#> $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2…
Plot 1: Basic bar plot
diamonds %>%
ggplot(aes(cut)) +
geom_bar()
It may seem that the bars are sorted (in increasing frequency) out of the box. Unfortunately, they are not. cut
is an ordered factor, where the order is pre-defined:
str(diamonds$cut)
#> Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
Ggplot takes the order of the factor level and orders the bars accordingly; that’s the general behavior one has to bear in mind.
Plot 2: Bar plot, unsorted is default
For the sake of demonstration, consider the following plot:
diamonds %>%
ggplot(aes(x = color)) +
geom_bar()
As can be seen, the abscissa is ordered alphabetically which means in turn that the bars are not sorted by height (ie., frequency of the levels). That’s the ggplot2 default. Let’s change that.
Plot 3: sorted bar plot
There are multiple ways of sorting the bars, see here or here for examples.
One particular simple way is the helper function fct_infreq
from forcats
. From the help:
f <- factor(c("b", "b", "a", "c", "c", "c"))
f
#> [1] b b a c c c
#> Levels: a b c
fct_infreq(f)
#> [1] b b a c c c
#> Levels: c b a
Using this function, sorting becomes easy:
diamonds %>%
ggplot(aes(x = fct_infreq(color))) +
geom_bar()
Plot 4: Adding the legend
Suppose we would like to add a legend. Why would we do so in the first place? One reason could be that the level names are quite long, so we do not want to print them underneath.
diamonds %>%
ggplot(aes(x = fct_infreq(color),
fill = fct_infreq(color))) +
geom_bar()
Note that the legend is sorted accordingly.
Plot 5: Alternative to printing a legend
Note that a legend is suboptimal here, because the eye has to flip back and forth: from the bar to the legend and back again. Let’s try to put it differently:
diamonds %>%
ggplot(aes(x = fct_infreq(color), fill = fct_infreq(color))) +
geom_bar() +
coord_flip() +
scale_fill_viridis_d(guide = FALSE)
Plot 6: Reverse order
For the flipped bar graph we might want to reverse the order of the bars. To that end, we need to reverse the factor levels. One comfortable way is using the function fct_rev()
:
f <- factor(c("a", "b", "c"))
fct_rev(f)
#> [1] a b c
#> Levels: c b a
(Taken from the help)
diamonds %>%
ggplot(aes(x = fct_rev(fct_infreq(color)),
fill = fct_rev(fct_infreq(color)))) +
geom_bar() +
coord_flip() +
scale_fill_viridis_d(guide = FALSE)
Plot 7: Final polishment
diamonds %>%
ggplot(aes(x = fct_rev(fct_infreq(color)))) +
geom_bar() +
coord_flip() +
scale_fill_viridis_d(guide = FALSE) +
theme_ipsum_rc() +
labs(x = "Color",
y = "Count")
There’s probably no need for color in this (flipped version of the) plot. According to Tufte, we should not spoil ink if no (nonredundant) information is transmitted.
Debrief
Of course there many other ways to get going; geom_col
is another option, where we should prepopulate the frequencies.
Happy plotting!