library(tidyverse) library(forcats) library(hrbrthemes)
What we want to achieve: barplot ggplot2-diagram where bars and legend labels are sorted
Say we would like to plot frequencies, and would like to use
ggplot2 for that purpose. How can we get a decent graph? This post shows some ways.
A glimpse to the data
glimpse(diamonds) #> Rows: 53,940 #> Columns: 10 #> $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0… #> $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ve… #> $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I… #> $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1,… #> $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 6… #> $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 5… #> $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 3… #> $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4… #> $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4… #> $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2…
Plot 1: Basic bar plot
diamonds %>% ggplot(aes(cut)) + geom_bar()
It may seem that the bars are sorted (in increasing frequency) out of the box. Unfortunately, they are not.
cut is an ordered factor, where the order is pre-defined:
str(diamonds$cut) #> Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
Ggplot takes the order of the factor level and orders the bars accordingly; that’s the general behavior one has to bear in mind.
Plot 2: Bar plot, unsorted is default
For the sake of demonstration, consider the following plot:
diamonds %>% ggplot(aes(x = color)) + geom_bar()
As can be seen, the abscissa is ordered alphabetically which means in turn that the bars are not sorted by height (ie., frequency of the levels). That’s the ggplot2 default. Let’s change that.
Plot 3: sorted bar plot
One particular simple way is the helper function
forcats. From the help:
f <- factor(c("b", "b", "a", "c", "c", "c")) f #>  b b a c c c #> Levels: a b c fct_infreq(f) #>  b b a c c c #> Levels: c b a
Using this function, sorting becomes easy:
diamonds %>% ggplot(aes(x = fct_infreq(color))) + geom_bar()
Plot 4: Adding the legend
Suppose we would like to add a legend. Why would we do so in the first place? One reason could be that the level names are quite long, so we do not want to print them underneath.
diamonds %>% ggplot(aes(x = fct_infreq(color), fill = fct_infreq(color))) + geom_bar()
Note that the legend is sorted accordingly.
Plot 5: Alternative to printing a legend
Note that a legend is suboptimal here, because the eye has to flip back and forth: from the bar to the legend and back again. Let’s try to put it differently:
diamonds %>% ggplot(aes(x = fct_infreq(color), fill = fct_infreq(color))) + geom_bar() + coord_flip() + scale_fill_viridis_d(guide = FALSE)
Plot 6: Reverse order
For the flipped bar graph we might want to reverse the order of the bars. To that end, we need to reverse the factor levels. One comfortable way is using the function
f <- factor(c("a", "b", "c")) fct_rev(f) #>  a b c #> Levels: c b a
(Taken from the help)
diamonds %>% ggplot(aes(x = fct_rev(fct_infreq(color)), fill = fct_rev(fct_infreq(color)))) + geom_bar() + coord_flip() + scale_fill_viridis_d(guide = FALSE)
Plot 7: Final polishment
diamonds %>% ggplot(aes(x = fct_rev(fct_infreq(color)))) + geom_bar() + coord_flip() + scale_fill_viridis_d(guide = FALSE) + theme_ipsum_rc() + labs(x = "Color", y = "Count")
There’s probably no need for color in this (flipped version of the) plot. According to Tufte, we should not spoil ink if no (nonredundant) information is transmitted.
Of course there many other ways to get going;
geom_col is another option, where we should prepopulate the frequencies.