Plotting barplots is a frequent endeavor for the analysis of qualitative data. Numerous methods for plotting barplots exist; the popular R package mosaic
also provides methods.
More recently, mosaic
switched to a ggplot wrapper for plotting diagrams, that is gf_XXX()
, packaged in ggformula
. That implies that input data is expected to be tidy, because ggplot, a central member of the tidyverse, excepts its input data to be tidy.
Let’s check an example.
Setup
library(mosaic)
data(tips, package = "reshape2")
Old way of barplots in mosaic
The “old” way of plotting barplots in mosaic was lattice based:
bargraph(~day, data = tips)
Tidyverse, simple
Here, the present day tidyverse style:
gf_bar( ~ day, data = tips)
That’s simple.
More complex case, dplyr-style
Assume we are doing some analysis upfront, and we do it the dplyr way:
tips_count <- tips %>%
filter(tip > 2, day != "Thu") %>%
count(day)
tips_count
## # A tibble: 4 x 2
## day n
## <fct> <int>
## 1 Fri 13
## 2 Sat 60
## 3 Sun 57
## 4 Thur 36
That’s a tidy output. Now again, let’s plot the frequencies:
gf_col(n ~ day, data = tips_count)
Simple, works.
More complex case, using mosaic
tips_count2 <- tips_count <- tips %>%
filter(tip > 2, day != "Thu") %>%
mosaic::tally(~day, data = .)
tips_count2
## day
## Fri Sat Sun Thur
## 13 60 57 36
str(tips_count2)
## 'table' int [1:4(1d)] 13 60 57 36
## - attr(*, "dimnames")=List of 1
## ..$ day: chr [1:4] "Fri" "Sat" "Sun" "Thur"
mosaic’s tally()
does not produce a tidy data frame, so cannot (directly) be handed over to gf_XXX()
.
Workarounds
Of course, there are workarounds, if one wants to go on using mosaic’s tally()
. For instance, gathering works:
library(tidyverse)
tips_count2 %>%
as_tibble() %>%
tidyr::gather(key = day, value = n)
## # A tibble: 4 x 2
## day n
## <chr> <int>
## 1 Fri 13
## 2 Sat 60
## 3 Sun 57
## 4 Thur 36
Tidy again. Note that gather()
is not attached by mosaic()
, hence the call library(tidyverse)
.
Again, there are functions which are happy to digest a table
(aka, not tidy) as given back by mosaic::tally()
:
barplot(tips_count2)
df_stats()
does not work, apparently
From the package description of mosaicCore
for df_stats()
:
Creates a data frame of statistics calculated on one variable, possibly for each group formed by combinations of additional variables. The resulting data frame has one column for each of the statistics requested as well as columns for any grouping variables.
So, df_stats()
should be useful to get tidy data. However, for tally()
it appears not to do so:
df_stats(~day, data = tips, tally)
## Fri Sat Sun Thur
## 1 19 87 76 62
df_stats(~day, data = tips, tally) %>% str()
## 'data.frame': 1 obs. of 4 variables:
## $ Fri : int 19
## $ Sat : int 87
## $ Sun : int 76
## $ Thur: int 62
The function outputs a data.frame, but not in tidy format.
Similarly, with LHS formula:
df_stats(day ~ sex, data = tips, tally)
## sex Fri Sat Sun Thur
## 1 Female 9 28 18 32
## 2 Male 10 59 58 30
Some interesting (minor?) differences:
df_stats(day ~ sex, data = tips, counts)
## sex n_Fri n_Sat n_Sun n_Thur
## 1 Female 9 28 18 32
## 2 Male 10 59 58 30
df_stats(day ~ sex, data = tips, count)
## sex n_Fri
## 1 Female 9
## 2 Male 10
df_stats(sex ~ day, data = tips, counts)
## day n_Female n_Male
## 1 Fri 9 10
## 2 Sat 28 59
## 3 Sun 18 58
## 4 Thur 32 30
df_stats(~ day | sex, data = tips, counts, props)
## sex n_Fri n_Sat n_Sun n_Thur prop_Fri prop_Sat prop_Sun prop_Thur
## 1 Female 9 28 18 32 0.10344828 0.3218391 0.2068966 0.3678161
## 2 Male 10 59 58 30 0.06369427 0.3757962 0.3694268 0.1910828
Nice, but still not tidy.
df_stats()
, long format, one variable - does not work
I just discovered that there’s a long format parameter in `df_stats(). Let’s checkout that.
df_stats( ~ day, data = tips, tally, format = "long")
## Fri stat value
## 1 19 Sat 87
## 2 19 Sun 76
## 3 19 Thur 62
Well, that’s kind of tidy, but not really, it seems hard to make use of that. Compare to standard tally()
once more:
tally(~day, data = tips)
## day
## Fri Sat Sun Thur
## 19 87 76 62
df_stats()
, long format, two variables - works
Maybe this way:
df_stats(~ day | sex, data = tips, tally, format = "long")
## sex stat value
## 1 Female Fri 9
## 2 Male Fri 10
## 3 Female Sat 28
## 4 Male Sat 59
## 5 Female Sun 18
## 6 Male Sun 58
## 7 Female Thur 32
## 8 Male Thur 30
count(tips, day, sex)
## # A tibble: 8 x 3
## day sex n
## <fct> <fct> <int>
## 1 Fri Female 9
## 2 Fri Male 10
## 3 Sat Female 28
## 4 Sat Male 59
## 5 Sun Female 18
## 6 Sun Male 58
## 7 Thur Female 32
## 8 Thur Male 30
Hey, now it works! We’ve received a tidy data frame.
Conclusions
mosaic
gives tidy data frames using the function df_stats()
, that works great for numeric summaries, such as favstats()
. For qualitative variables, things appear more complicated. The base line is that df_bar()
will work, as the raw data is (practically in most times) tidy. If we want to compute the counts by hand, the trouble starts. With two variables, ie., day ~ sex
things do workout, but apparently not for simply ~ day
. What to do in this case`?
One conclusion would be not to use mosaic::tally()
but dplyr::count()
instead, as count()
yields back a tidy data frame.
Alternative, the output of mosaic’s tally()
can be “tidyfied” after the fact using tidyr::gather()
.
The mosaic functions such as df_stats()
are in active development, so we can expect new developments. In addition, I may be missing something.
session_info()
sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
## setting value
## version R version 3.5.1 (2018-07-02)
## os macOS 10.14.2
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Europe/Berlin
## date 2019-01-11
##
## ─ Packages ──────────────────────────────────────────────────────────────
## package * version date lib
## assertthat 0.2.0 2017-04-11 [1]
## backports 1.1.3 2018-12-14 [1]
## blogdown 0.10 2019-01-09 [1]
## bookdown 0.9 2018-12-21 [1]
## broom 0.5.1 2018-12-05 [1]
## cellranger 1.1.0 2016-07-27 [1]
## cli 1.0.1 2018-09-25 [1]
## colorspace 1.3-2 2016-12-14 [1]
## crayon 1.3.4 2017-09-16 [1]
## digest 0.6.18 2018-10-10 [1]
## dplyr * 0.8.0.9000 2019-01-07 [1]
## evaluate 0.12 2018-10-09 [1]
## fansi 0.4.0 2018-10-05 [1]
## forcats * 0.3.0 2018-02-19 [1]
## generics 0.0.2 2018-11-29 [1]
## ggdendro 0.1-20 2016-04-27 [1]
## ggformula * 0.9.0 2018-08-03 [1]
## ggplot2 * 3.1.0 2018-10-25 [1]
## ggrepel 0.8.0 2018-05-09 [1]
## ggstance * 0.3.1 2018-07-20 [1]
## glue 1.3.0.9000 2019-01-07 [1]
## gridExtra 2.3 2017-09-09 [1]
## gtable 0.2.0 2016-02-26 [1]
## haven 2.0.0 2018-11-22 [1]
## hms 0.4.2 2018-03-10 [1]
## htmltools 0.3.6 2017-04-28 [1]
## httr 1.4.0 2018-12-11 [1]
## jsonlite 1.6 2018-12-07 [1]
## knitr 1.21 2018-12-10 [1]
## labeling 0.3 2014-08-23 [1]
## lattice * 0.20-35 2017-03-25 [2]
## lazyeval 0.2.1 2017-10-29 [1]
## lubridate 1.7.4 2018-04-11 [1]
## magrittr 1.5 2014-11-22 [1]
## MASS 7.3-50 2018-04-30 [2]
## Matrix * 1.2-14 2018-04-13 [2]
## modelr 0.1.2 2018-05-11 [1]
## mosaic * 1.4.0 2018-08-11 [1]
## mosaicCore 0.6.0 2018-06-24 [1]
## mosaicData * 0.17.0 2018-06-23 [1]
## munsell 0.5.0 2018-06-12 [1]
## nlme 3.1-137 2018-04-07 [2]
## pillar 1.3.1 2018-12-15 [1]
## pkgconfig 2.0.2 2018-08-16 [1]
## plyr 1.8.4 2016-06-08 [1]
## purrr * 0.2.5 2018-05-29 [1]
## R6 2.3.0 2018-10-04 [1]
## Rcpp 1.0.0 2018-11-07 [1]
## readr * 1.3.1 2018-12-21 [1]
## readxl 1.2.0 2018-12-19 [1]
## rlang 0.3.1 2019-01-08 [1]
## rmarkdown 1.11 2018-12-08 [1]
## rstudioapi 0.9.0 2019-01-09 [1]
## rvest 0.3.2 2016-06-17 [1]
## scales 1.0.0 2018-08-09 [1]
## sessioninfo 1.1.1.9000 2019-01-04 [1]
## stringi 1.2.4 2018-07-20 [1]
## stringr * 1.3.1 2018-05-10 [1]
## tibble * 2.0.0 2019-01-04 [1]
## tidyr * 0.8.2 2018-10-28 [1]
## tidyselect 0.2.5 2018-10-11 [1]
## tidyverse * 1.2.1 2017-11-14 [1]
## utf8 1.1.4 2018-05-24 [1]
## withr 2.1.2 2018-03-15 [1]
## xfun 0.4 2018-10-23 [1]
## xml2 1.2.0 2018-01-24 [1]
## yaml 2.2.0 2018-07-25 [1]
## source
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.2)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Github (tidyverse/dplyr@df6b7d5)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Github (tidyverse/glue@3f7012c)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.1)
## CRAN (R 3.5.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.2)
## CRAN (R 3.5.0)
## CRAN (R 3.5.2)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## Github (r-lib/sessioninfo@ac8fcc1)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.2)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
## CRAN (R 3.5.0)
##
## [1] /Users/sebastiansaueruser/Rlibs
## [2] /Library/Frameworks/R.framework/Versions/3.5/Resources/library