1 Load packages
library(tidyverse) # data wrangling
2 Motivation
Assume we would like to compute contingency tables in R without much ado. Let’s explore some ways.
3 Toy data
data(mtcars)
4 Using table
and friends
mtcars |>
select(vs, am) |>
table()
#> am
#> vs 0 1
#> 0 12 6
#> 1 7 7
Let’s add margins:
mtcars |>
select(vs, am) |>
table() |>
addmargins()
#> am
#> vs 0 1 Sum
#> 0 12 6 18
#> 1 7 7 14
#> Sum 19 13 32
Let’s show rather proportions instead of raw counts:
mtcars |>
select(vs, am) |>
table() |>
prop.table() |>
addmargins()
#> am
#> vs 0 1 Sum
#> 0 0.37500 0.18750 0.56250
#> 1 0.21875 0.21875 0.43750
#> Sum 0.59375 0.40625 1.00000
Note that addmargins
must appear after proptable
.
Let’s round to 2 places:
mtcars |>
select(vs, am) |>
table() |>
prop.table() |>
addmargins() |>
round(2)
#> am
#> vs 0 1 Sum
#> 0 0.38 0.19 0.56
#> 1 0.22 0.22 0.44
#> Sum 0.59 0.41 1.00
Let’s change the margin, i.e, the groups which should sum up to 100%:
Proportions per row:
mtcars |>
select(vs, am) |>
table() |>
prop.table(margin = 1) |>
addmargins() |>
round(2)
#> am
#> vs 0 1 Sum
#> 0 0.67 0.33 1.00
#> 1 0.50 0.50 1.00
#> Sum 1.17 0.83 2.00
Proportion per column:
mtcars |>
select(vs, am) |>
table() |>
prop.table(margin = 2) |>
addmargins() |>
round(2)
#> am
#> vs 0 1 Sum
#> 0 0.63 0.46 1.09
#> 1 0.37 0.54 0.91
#> Sum 1.00 1.00 2.00
5 Using count
mtcars |>
count(vs, am)
#> vs am n
#> 1 0 0 12
#> 2 0 1 6
#> 3 1 0 7
#> 4 1 1 7
Adding proportions:
mtcars |>
count(vs, am) |>
mutate(prop = n/sum(n))
#> vs am n prop
#> 1 0 0 12 0.37500
#> 2 0 1 6 0.18750
#> 3 1 0 7 0.21875
#> 4 1 1 7 0.21875
Since some while, count
returns an ungrouped table, we would need to group according to our wishes.
If we group e.g., by am
we get the proportion of vs
values for each group of am
:
mtcars |>
count(am, vs) |>
group_by(am) |>
mutate(prop = n/sum(n))
#> # A tibble: 4 × 4
#> # Groups: am [2]
#> am vs n prop
#> <dbl> <dbl> <int> <dbl>
#> 1 0 0 12 0.632
#> 2 0 1 7 0.368
#> 3 1 0 6 0.462
#> 4 1 1 7 0.538
Grouping by vs
:
mtcars |>
count(am, vs) |>
group_by(vs) |>
mutate(prop = n/sum(n))
#> # A tibble: 4 × 4
#> # Groups: vs [2]
#> am vs n prop
#> <dbl> <dbl> <int> <dbl>
#> 1 0 0 12 0.667
#> 2 0 1 7 0.5
#> 3 1 0 6 0.333
#> 4 1 1 7 0.5
We can also sum up all groups to 100%, if we do not group at all:
mtcars |>
count(am, vs) |>
mutate(prop = n/sum(n))
#> am vs n prop
#> 1 0 0 12 0.37500
#> 2 0 1 7 0.21875
#> 3 1 0 6 0.18750
#> 4 1 1 7 0.21875
6 Don’t drop unused factor levels
Use count(..., .drop = FALSE)
to prevent that unused factor levels are dropped.
7 See also
This SO post shows a nice overview on grouping using count
from dplyr
.
8 Conclusions
There are many ways in R to produce contingency tables. Here are two :-)
9 Reproducibility
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os macOS Big Sur ... 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2024-01-12
#> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> blogdown 1.18 2023-06-19 [1] CRAN (R 4.2.0)
#> bookdown 0.36 2023-10-16 [1] CRAN (R 4.2.0)
#> bslib 0.5.1 2023-08-11 [1] CRAN (R 4.2.0)
#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.0)
#> callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0)
#> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.2.0)
#> colorout * 1.3-0 2023-11-08 [1] Github (jalvesaq/colorout@8384882)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
#> devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.2.0)
#> dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.0)
#> fansi 1.0.5 2023-10-08 [1] CRAN (R 4.2.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0)
#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.2.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.2.0)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.0)
#> htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.2.0)
#> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.0)
#> httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.2.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.2.0)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.2.1)
#> later 1.3.1 2023-05-02 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.2.1)
#> lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
#> mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
#> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0)
#> pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
#> processx 3.8.2 2023-06-30 [1] CRAN (R 4.2.0)
#> profvis 0.3.8 2023-05-02 [1] CRAN (R 4.2.0)
#> promises 1.2.1 2023-08-10 [1] CRAN (R 4.2.0)
#> ps 1.7.5 2023-04-18 [1] CRAN (R 4.2.0)
#> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.2.0)
#> readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.0)
#> remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.2.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.2.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.2.0)
#> sass 0.4.7 2023-07-15 [1] CRAN (R 4.2.0)
#> scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> shiny 1.8.0 2023-11-17 [1] CRAN (R 4.2.1)
#> stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0)
#> stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.2.1)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.0)
#> tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.0)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.0)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.2.0)
#> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.0)
#> usethis 2.2.2 2023-07-06 [1] CRAN (R 4.2.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0)
#> vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.2.0)
#> withr 2.5.2 2023-10-30 [1] CRAN (R 4.2.1)
#> xfun 0.40 2023-08-09 [1] CRAN (R 4.2.0)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/sebastiansaueruser/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────