1 Load packages
library(tidyverse) # data wrangling
2 Background
Nassim Taleb points out in the paper Fooled by Correlation: Common Misinterpretations in Social “Science” that spurious correlation may appear due to various reasons. One reason is what he calls the “Dead Man Bias”, occuring if constant data is added to uniformly distributed data. The thing is if the data is uniformly distributed and non-correlated, you will get a spurious correlation if constant data is added.
3 Data example
Let’s simulate a dataset, \(n=10^4\) with \(X,Y \sim Unif(0,1)\).
set.seed(42)
n <- 1e4
d <-
tibble(x = runif(n = n),
y = runif(n = n))
d_dead_man <-
tibble(x = rep(0, n * 0.2),
y = x)
d2 <-
d |>
bind_rows(d_dead_man)
Here’s their correlation:
d2 |>
summarise(cor_x_y = cor(x, y))
#> # A tibble: 1 × 1
#> cor_x_y
#> <dbl>
#> 1 0.329
As can be seen, a spuriou correlation pops up.
d2 |>
ggplot( aes(x = x, y = y)) +
stat_density_2d(aes(fill = ..density..), geom = "tile", contour = FALSE) +
scale_fill_gradient(low = "blue", high = "red") +
theme_minimal()
4 Discussion
Note that this bias does not appear for normally distributed data.
5 Reproducibility
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.1 (2022-06-23)
#> os macOS Big Sur ... 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Europe/Berlin
#> date 2024-03-05
#> pandoc 3.1.12.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> blogdown 1.18 2023-06-19 [1] CRAN (R 4.2.0)
#> bookdown 0.36 2023-10-16 [1] CRAN (R 4.2.0)
#> bslib 0.6.1 2023-11-28 [1] CRAN (R 4.2.0)
#> cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.0)
#> callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.0)
#> cli 3.6.2 2023-12-11 [1] CRAN (R 4.2.0)
#> codetools 0.2-19 2023-02-01 [1] CRAN (R 4.2.0)
#> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
#> devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.2.0)
#> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
#> evaluate 0.23 2023-11-01 [1] CRAN (R 4.2.0)
#> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.2.0)
#> farver 2.1.1 2022-07-06 [1] CRAN (R 4.2.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0)
#> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.0)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.2.0)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
#> ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.2.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
#> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.2.0)
#> highr 0.10 2022-12-22 [1] CRAN (R 4.2.0)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.0)
#> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.2.0)
#> htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.0)
#> httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.2.0)
#> jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#> jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.2.0)
#> knitr 1.45 2023-10-30 [1] CRAN (R 4.2.1)
#> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.2.0)
#> later 1.3.1 2023-05-02 [1] CRAN (R 4.2.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.2.1)
#> lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.2.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
#> MASS 7.3-60 2023-05-04 [1] CRAN (R 4.2.0)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
#> mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
#> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0)
#> pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
#> pkgload 1.3.2.1 2023-07-08 [1] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
#> processx 3.8.2 2023-06-30 [1] CRAN (R 4.2.0)
#> profvis 0.3.8 2023-05-02 [1] CRAN (R 4.2.0)
#> promises 1.2.1 2023-08-10 [1] CRAN (R 4.2.0)
#> ps 1.7.5 2023-04-18 [1] CRAN (R 4.2.0)
#> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.2.0)
#> readr * 2.1.5 2024-01-10 [1] CRAN (R 4.2.1)
#> remotes 2.4.2.1 2023-07-18 [1] CRAN (R 4.2.0)
#> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.2.1)
#> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.2.0)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.2.0)
#> sass 0.4.8 2023-12-06 [1] CRAN (R 4.2.0)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.2.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
#> shiny 1.8.0 2023-11-17 [1] CRAN (R 4.2.1)
#> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.2.0)
#> stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.2.1)
#> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.0)
#> tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.2.1)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.0)
#> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.0)
#> timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.0)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.2.0)
#> urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.0)
#> usethis 2.2.2 2023-07-06 [1] CRAN (R 4.2.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.2.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.2.0)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.2.1)
#> xfun 0.41 2023-11-01 [1] CRAN (R 4.2.0)
#> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
#> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.2.0)
#>
#> [1] /Users/sebastiansaueruser/Rlibs
#> [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────