Load packages
library(tidyverse)
library(reshape2) # for data
library(mosaic)
library(sjmisc)
library(skimr)
Data setup
data(tips)
Aggregate data per group
tips_aggr <- tips %>%
group_by(smoker) %>%
summarise(tip_avg = mean(tip),
tip_md = median(tip),
tip_sd = sd(tip),
tip_iqr = IQR(tip))
tips_aggr
#> # A tibble: 2 x 5
#> smoker tip_avg tip_md tip_sd tip_iqr
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 No 2.99 2.74 1.38 1.50
#> 2 Yes 3.01 3 1.40 1.68
The same lines, more concisely:
tips_descr <- tips %>%
group_by(smoker) %>%
descr(tip)
tips_descr
#>
#> ## Basic descriptive statistics
#>
#>
#> Grouped by: No
#>
#> var type label n NA.prc mean sd se md trimmed range iqr skew
#> tip numeric tip 151 0 2.99 1.38 0.11 2.74 2.83 8 (1-9) 1.505 1.32
#>
#>
#> Grouped by: Yes
#>
#> var type label n NA.prc mean sd se md trimmed range iqr skew
#> tip numeric tip 93 0 3.01 1.4 0.15 3 2.86 9 (1-10) 1.68 1.72
descr
handles back a list, which may be not practical for further processing.
skim
provides another alternative:
tips_skim <- tips %>%
group_by(smoker) %>%
skim(tip)
tips_skim
Name | Piped data |
Number of rows | 244 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | smoker |
Variable type: numeric
skim_variable | smoker | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
tip | No | 0 | 1 | 2.99 | 1.38 | 1 | 2 | 2.74 | 3.50 | 9 | ▇▆▂▁▁ |
tip | Yes | 0 | 1 | 3.01 | 1.40 | 1 | 2 | 3.00 | 3.68 | 10 | ▇▇▁▁▁ |
This function gives back a tidy data frame. Nice.
Alternative, using mosaic
:
tips_fav <- tips %>%
favstats(tip ~ smoker, data = .)
tips_fav
#> smoker min Q1 median Q3 max mean sd n missing
#> 1 No 1 2 2.74 3.505 9 2.991854 1.377190 151 0
#> 2 Yes 1 2 3.00 3.680 10 3.008710 1.401468 93 0
Plot 1
ggplot(tips_skim) +
aes(x = smoker, y = numeric.mean) +
geom_line(group = 1) +
geom_pointrange(aes(ymin = numeric.mean - numeric.sd,
ymax = numeric.mean + numeric.sd),
color = "grey40") +
geom_point(size = 5) +
ylim(0, 5) +
labs(caption = "Error bars represent standard deviation",
y = "average tip")
List to data frame
tips_descr
gives us a list, but more often that not, we would like to go on using a tibble. That’s what’s enframe
is for. Subsequently, we can make use of unnest
to unnest the list-column value
.
tips_descr %>%
enframe() %>%
unnest(value) # that's the name of the list-column to be unnested
#> # A tibble: 2 x 14
#> name var type label n NA.prc mean sd se md trimmed range
#> <int> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 1 tip nume… tip 151 0 2.99 1.38 0.112 2.74 2.83 8 (1…
#> 2 2 tip nume… tip 93 0 3.01 1.40 0.145 3 2.86 9 (1…
#> # … with 2 more variables: iqr <dbl>, skew <dbl>