How to sort the labels of the legend in a ggplot-diagram

Load packages

library(tidyverse)
library(forcats)
library(hrbrthemes)

What we want to achieve: barplot ggplot2-diagram where bars and legend labels are sorted

Say we would like to plot frequencies, and would like to use ggplot2 for that purpose. How can we get a decent graph? This post shows some ways.

Some data

data(diamonds)

A glimpse to the data

glimpse(diamonds)
#> Rows: 53,940
#> Columns: 10
#> $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, 0.23, 0…
#> $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very Good, Ve…
#> $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, J, J, I…
#> $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI1, VS1,…
#> $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, 59.4, 6…
#> $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54, 62, 5…
#> $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339, 340, 3…
#> $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, 4.00, 4…
#> $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, 4.05, 4…
#> $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, 2.39, 2…

Plot 1: Basic bar plot

diamonds %>% 
  ggplot(aes(cut)) +
  geom_bar()

It may seem that the bars are sorted (in increasing frequency) out of the box. Unfortunately, they are not. cut is an ordered factor, where the order is pre-defined:

str(diamonds$cut)
#>  Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...

Ggplot takes the order of the factor level and orders the bars accordingly; that’s the general behavior one has to bear in mind.

Plot 2: Bar plot, unsorted is default

For the sake of demonstration, consider the following plot:

diamonds %>% 
  ggplot(aes(x = color)) +
  geom_bar()

As can be seen, the abscissa is ordered alphabetically which means in turn that the bars are not sorted by height (ie., frequency of the levels). That’s the ggplot2 default. Let’s change that.

Plot 3: sorted bar plot

There are multiple ways of sorting the bars, see here or here for examples.

One particular simple way is the helper function fct_infreq from forcats. From the help:

f <- factor(c("b", "b", "a", "c", "c", "c"))
f
#> [1] b b a c c c
#> Levels: a b c
fct_infreq(f)
#> [1] b b a c c c
#> Levels: c b a

Using this function, sorting becomes easy:

diamonds %>% 
  ggplot(aes(x = fct_infreq(color))) +
  geom_bar()

Plot 4: Adding the legend

Suppose we would like to add a legend. Why would we do so in the first place? One reason could be that the level names are quite long, so we do not want to print them underneath.

diamonds %>% 
  ggplot(aes(x = fct_infreq(color), 
             fill = fct_infreq(color))) +
  geom_bar() 

Note that the legend is sorted accordingly.

Plot 5: Alternative to printing a legend

Note that a legend is suboptimal here, because the eye has to flip back and forth: from the bar to the legend and back again. Let’s try to put it differently:

diamonds %>% 
  ggplot(aes(x = fct_infreq(color), fill = fct_infreq(color))) +
  geom_bar() +
  coord_flip() +
  scale_fill_viridis_d(guide = FALSE)

Plot 6: Reverse order

For the flipped bar graph we might want to reverse the order of the bars. To that end, we need to reverse the factor levels. One comfortable way is using the function fct_rev():

f <- factor(c("a", "b", "c"))
fct_rev(f)
#> [1] a b c
#> Levels: c b a

(Taken from the help)

diamonds %>% 
  ggplot(aes(x = fct_rev(fct_infreq(color)), 
             fill = fct_rev(fct_infreq(color)))) +
  geom_bar() +
  coord_flip() +
  scale_fill_viridis_d(guide = FALSE)

Plot 7: Final polishment

diamonds %>% 
  ggplot(aes(x = fct_rev(fct_infreq(color)))) +
  geom_bar() +
  coord_flip() +
  scale_fill_viridis_d(guide = FALSE) +
  theme_ipsum_rc() +
  labs(x = "Color",
       y = "Count")

There’s probably no need for color in this (flipped version of the) plot. According to Tufte, we should not spoil ink if no (nonredundant) information is transmitted.

Debrief

Of course there many other ways to get going; geom_col is another option, where we should prepopulate the frequencies.

Happy plotting!