8 min read

Barplots with mosaic

Plotting barplots is a frequent endeavor for the analysis of qualitative data. Numerous methods for plotting barplots exist; the popular R package mosaic also provides methods.

More recently, mosaic switched to a ggplot wrapper for plotting diagrams, that is gf_XXX(), packaged in ggformula. That implies that input data is expected to be tidy, because ggplot, a central member of the tidyverse, excepts its input data to be tidy.

Let’s check an example.

Setup

library(mosaic)
data(tips, package = "reshape2")

Old way of barplots in mosaic

The “old” way of plotting barplots in mosaic was lattice based:

bargraph(~day, data = tips)

Tidyverse, simple

Here, the present day tidyverse style:

gf_bar( ~ day, data = tips)

That’s simple.

More complex case, dplyr-style

Assume we are doing some analysis upfront, and we do it the dplyr way:

tips_count <- tips %>% 
  filter(tip > 2, day != "Thu") %>% 
  count(day)

tips_count
## # A tibble: 4 x 2
##   day       n
##   <fct> <int>
## 1 Fri      13
## 2 Sat      60
## 3 Sun      57
## 4 Thur     36

That’s a tidy output. Now again, let’s plot the frequencies:

gf_col(n ~ day, data = tips_count)

Simple, works.

More complex case, using mosaic

tips_count2 <- tips_count <- tips %>% 
  filter(tip > 2, day != "Thu") %>% 
  mosaic::tally(~day, data = .)

tips_count2
## day
##  Fri  Sat  Sun Thur 
##   13   60   57   36
str(tips_count2)
##  'table' int [1:4(1d)] 13 60 57 36
##  - attr(*, "dimnames")=List of 1
##   ..$ day: chr [1:4] "Fri" "Sat" "Sun" "Thur"

mosaic’s tally() does not produce a tidy data frame, so cannot (directly) be handed over to gf_XXX().

Workarounds

Of course, there are workarounds, if one wants to go on using mosaic’s tally(). For instance, gathering works:

library(tidyverse)
tips_count2 %>% 
  as_tibble() %>% 
  tidyr::gather(key = day, value = n)
## # A tibble: 4 x 2
##   day       n
##   <chr> <int>
## 1 Fri      13
## 2 Sat      60
## 3 Sun      57
## 4 Thur     36

Tidy again. Note that gather() is not attached by mosaic(), hence the call library(tidyverse).

Again, there are functions which are happy to digest a table (aka, not tidy) as given back by mosaic::tally():

barplot(tips_count2)

df_stats() does not work, apparently

From the package description of mosaicCore for df_stats():

Creates a data frame of statistics calculated on one variable, possibly for each group formed by combinations of additional variables. The resulting data frame has one column for each of the statistics requested as well as columns for any grouping variables.

So, df_stats() should be useful to get tidy data. However, for tally() it appears not to do so:

df_stats(~day, data = tips, tally)
##   Fri Sat Sun Thur
## 1  19  87  76   62
df_stats(~day, data = tips, tally) %>% str()
## 'data.frame':    1 obs. of  4 variables:
##  $ Fri : int 19
##  $ Sat : int 87
##  $ Sun : int 76
##  $ Thur: int 62

The function outputs a data.frame, but not in tidy format.

Similarly, with LHS formula:

df_stats(day ~ sex, data = tips, tally)
##      sex Fri Sat Sun Thur
## 1 Female   9  28  18   32
## 2   Male  10  59  58   30

Some interesting (minor?) differences:

df_stats(day ~ sex, data = tips, counts)
##      sex n_Fri n_Sat n_Sun n_Thur
## 1 Female     9    28    18     32
## 2   Male    10    59    58     30
df_stats(day ~ sex, data = tips, count)
##      sex n_Fri
## 1 Female     9
## 2   Male    10
df_stats(sex ~ day, data = tips, counts)
##    day n_Female n_Male
## 1  Fri        9     10
## 2  Sat       28     59
## 3  Sun       18     58
## 4 Thur       32     30
df_stats(~ day | sex, data = tips, counts, props)
##      sex n_Fri n_Sat n_Sun n_Thur   prop_Fri  prop_Sat  prop_Sun prop_Thur
## 1 Female     9    28    18     32 0.10344828 0.3218391 0.2068966 0.3678161
## 2   Male    10    59    58     30 0.06369427 0.3757962 0.3694268 0.1910828

Nice, but still not tidy.

df_stats(), long format, one variable - does not work

I just discovered that there’s a long format parameter in `df_stats(). Let’s checkout that.

df_stats( ~ day, data = tips, tally, format = "long")
##   Fri stat value
## 1  19  Sat    87
## 2  19  Sun    76
## 3  19 Thur    62

Well, that’s kind of tidy, but not really, it seems hard to make use of that. Compare to standard tally() once more:

tally(~day, data = tips)
## day
##  Fri  Sat  Sun Thur 
##   19   87   76   62

df_stats(), long format, two variables - works

Maybe this way:

df_stats(~ day | sex, data = tips, tally, format = "long")
##      sex stat value
## 1 Female  Fri     9
## 2   Male  Fri    10
## 3 Female  Sat    28
## 4   Male  Sat    59
## 5 Female  Sun    18
## 6   Male  Sun    58
## 7 Female Thur    32
## 8   Male Thur    30
count(tips, day, sex)
## # A tibble: 8 x 3
##   day   sex        n
##   <fct> <fct>  <int>
## 1 Fri   Female     9
## 2 Fri   Male      10
## 3 Sat   Female    28
## 4 Sat   Male      59
## 5 Sun   Female    18
## 6 Sun   Male      58
## 7 Thur  Female    32
## 8 Thur  Male      30

Hey, now it works! We’ve received a tidy data frame.

Conclusions

mosaic gives tidy data frames using the function df_stats(), that works great for numeric summaries, such as favstats(). For qualitative variables, things appear more complicated. The base line is that df_bar() will work, as the raw data is (practically in most times) tidy. If we want to compute the counts by hand, the trouble starts. With two variables, ie., day ~ sex things do workout, but apparently not for simply ~ day. What to do in this case`?

One conclusion would be not to use mosaic::tally() but dplyr::count() instead, as count() yields back a tidy data frame.

Alternative, the output of mosaic’s tally() can be “tidyfied” after the fact using tidyr::gather().

The mosaic functions such as df_stats() are in active development, so we can expect new developments. In addition, I may be missing something.

session_info()

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  os       macOS  10.14.2              
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Europe/Berlin               
##  date     2019-01-11                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package     * version    date       lib
##  assertthat    0.2.0      2017-04-11 [1]
##  backports     1.1.3      2018-12-14 [1]
##  blogdown      0.10       2019-01-09 [1]
##  bookdown      0.9        2018-12-21 [1]
##  broom         0.5.1      2018-12-05 [1]
##  cellranger    1.1.0      2016-07-27 [1]
##  cli           1.0.1      2018-09-25 [1]
##  colorspace    1.3-2      2016-12-14 [1]
##  crayon        1.3.4      2017-09-16 [1]
##  digest        0.6.18     2018-10-10 [1]
##  dplyr       * 0.8.0.9000 2019-01-07 [1]
##  evaluate      0.12       2018-10-09 [1]
##  fansi         0.4.0      2018-10-05 [1]
##  forcats     * 0.3.0      2018-02-19 [1]
##  generics      0.0.2      2018-11-29 [1]
##  ggdendro      0.1-20     2016-04-27 [1]
##  ggformula   * 0.9.0      2018-08-03 [1]
##  ggplot2     * 3.1.0      2018-10-25 [1]
##  ggrepel       0.8.0      2018-05-09 [1]
##  ggstance    * 0.3.1      2018-07-20 [1]
##  glue          1.3.0.9000 2019-01-07 [1]
##  gridExtra     2.3        2017-09-09 [1]
##  gtable        0.2.0      2016-02-26 [1]
##  haven         2.0.0      2018-11-22 [1]
##  hms           0.4.2      2018-03-10 [1]
##  htmltools     0.3.6      2017-04-28 [1]
##  httr          1.4.0      2018-12-11 [1]
##  jsonlite      1.6        2018-12-07 [1]
##  knitr         1.21       2018-12-10 [1]
##  labeling      0.3        2014-08-23 [1]
##  lattice     * 0.20-35    2017-03-25 [2]
##  lazyeval      0.2.1      2017-10-29 [1]
##  lubridate     1.7.4      2018-04-11 [1]
##  magrittr      1.5        2014-11-22 [1]
##  MASS          7.3-50     2018-04-30 [2]
##  Matrix      * 1.2-14     2018-04-13 [2]
##  modelr        0.1.2      2018-05-11 [1]
##  mosaic      * 1.4.0      2018-08-11 [1]
##  mosaicCore    0.6.0      2018-06-24 [1]
##  mosaicData  * 0.17.0     2018-06-23 [1]
##  munsell       0.5.0      2018-06-12 [1]
##  nlme          3.1-137    2018-04-07 [2]
##  pillar        1.3.1      2018-12-15 [1]
##  pkgconfig     2.0.2      2018-08-16 [1]
##  plyr          1.8.4      2016-06-08 [1]
##  purrr       * 0.2.5      2018-05-29 [1]
##  R6            2.3.0      2018-10-04 [1]
##  Rcpp          1.0.0      2018-11-07 [1]
##  readr       * 1.3.1      2018-12-21 [1]
##  readxl        1.2.0      2018-12-19 [1]
##  rlang         0.3.1      2019-01-08 [1]
##  rmarkdown     1.11       2018-12-08 [1]
##  rstudioapi    0.9.0      2019-01-09 [1]
##  rvest         0.3.2      2016-06-17 [1]
##  scales        1.0.0      2018-08-09 [1]
##  sessioninfo   1.1.1.9000 2019-01-04 [1]
##  stringi       1.2.4      2018-07-20 [1]
##  stringr     * 1.3.1      2018-05-10 [1]
##  tibble      * 2.0.0      2019-01-04 [1]
##  tidyr       * 0.8.2      2018-10-28 [1]
##  tidyselect    0.2.5      2018-10-11 [1]
##  tidyverse   * 1.2.1      2017-11-14 [1]
##  utf8          1.1.4      2018-05-24 [1]
##  withr         2.1.2      2018-03-15 [1]
##  xfun          0.4        2018-10-23 [1]
##  xml2          1.2.0      2018-01-24 [1]
##  yaml          2.2.0      2018-07-25 [1]
##  source                            
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.2)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  Github (tidyverse/dplyr@df6b7d5)  
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  Github (tidyverse/glue@3f7012c)   
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.1)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.1)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.1)                    
##  CRAN (R 3.5.1)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.1)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.2)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.2)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  Github (r-lib/sessioninfo@ac8fcc1)
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.2)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
##  CRAN (R 3.5.0)                    
## 
## [1] /Users/sebastiansaueruser/Rlibs
## [2] /Library/Frameworks/R.framework/Versions/3.5/Resources/library