Dead Man Bias in correlation

1 Load packages

library(tidyverse)  # data wrangling

2 Background

Nassim Taleb points out in the paper Fooled by Correlation: Common Misinterpretations in Social “Science” that spurious correlation may appear due to various reasons. One reason is what he calls the “Dead Man Bias”, occuring if constant data is added to uniformly distributed data. The thing is if the data is uniformly distributed and non-correlated, you will get a spurious correlation if constant data is added.

3 Data example

Let’s simulate a dataset, \(n=10^4\) with \(X,Y \sim Unif(0,1)\).

set.seed(42)
n <- 1e4
d <-
  tibble(x = runif(n = n),
         y = runif(n = n))

d_dead_man <-
  tibble(x = rep(0, n * 0.2),
         y = x)

d2 <-
  d |> 
  bind_rows(d_dead_man)

Here’s their correlation:

d2 |> 
  summarise(cor_x_y = cor(x, y))
#> # A tibble: 1 × 1
#>   cor_x_y
#>     <dbl>
#> 1   0.329

As can be seen, a spuriou correlation pops up.

d2 |> 
ggplot( aes(x = x, y = y)) +
  stat_density_2d(aes(fill = ..density..), geom = "tile", contour = FALSE) +
  scale_fill_gradient(low = "blue", high = "red") +
  theme_minimal()

4 Discussion

Note that this bias does not appear for normally distributed data.

5 Reproducibility

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2024-03-05
#>  pandoc   3.1.12.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  blogdown      1.18    2023-06-19 [1] CRAN (R 4.2.0)
#>  bookdown      0.36    2023-10-16 [1] CRAN (R 4.2.0)
#>  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.2.0)
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.2.0)
#>  callr         3.7.3   2022-11-02 [1] CRAN (R 4.2.0)
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.2.0)
#>  codetools     0.2-19  2023-02-01 [1] CRAN (R 4.2.0)
#>  colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.2.0)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.2.1)
#>  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.2.1)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.2.0)
#>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.2.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.2.0)
#>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.2.0)
#>  farver        2.1.1   2022-07-06 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.0)
#>  forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.2.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.2.0)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2     * 3.5.0   2024-02-23 [1] CRAN (R 4.2.1)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  gtable        0.3.4   2023-08-21 [1] CRAN (R 4.2.0)
#>  highr         0.10    2022-12-22 [1] CRAN (R 4.2.0)
#>  hms           1.1.3   2023-03-21 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.2.0)
#>  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.2.0)
#>  httpuv        1.6.11  2023-05-11 [1] CRAN (R 4.2.0)
#>  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
#>  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.2.0)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.2.1)
#>  labeling      0.4.3   2023-08-29 [1] CRAN (R 4.2.0)
#>  later         1.3.1   2023-05-02 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.2.1)
#>  lubridate   * 1.9.3   2023-09-27 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS          7.3-60  2023-05-04 [1] CRAN (R 4.2.0)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
#>  mime          0.12    2021-09-28 [1] CRAN (R 4.2.0)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0)
#>  munsell       0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.0)
#>  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.2.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 4.2.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.2.0)
#>  processx      3.8.2   2023-06-30 [1] CRAN (R 4.2.0)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.2.0)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 4.2.0)
#>  ps            1.7.5   2023-04-18 [1] CRAN (R 4.2.0)
#>  purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.2.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.2.0)
#>  readr       * 2.1.5   2024-01-10 [1] CRAN (R 4.2.1)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.2.0)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.2.0)
#>  sass          0.4.8   2023-12-06 [1] CRAN (R 4.2.0)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  shiny         1.8.0   2023-11-17 [1] CRAN (R 4.2.1)
#>  stringi       1.8.3   2023-12-11 [1] CRAN (R 4.2.0)
#>  stringr     * 1.5.1   2023-11-14 [1] CRAN (R 4.2.1)
#>  tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.2.0)
#>  tidyr       * 1.3.1   2024-01-24 [1] CRAN (R 4.2.1)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.2.0)
#>  timechange    0.2.0   2023-01-11 [1] CRAN (R 4.2.0)
#>  tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.2.0)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.2.0)
#>  usethis       2.2.2   2023-07-06 [1] CRAN (R 4.2.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.2.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.2.0)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.2.1)
#>  xfun          0.41    2023-11-01 [1] CRAN (R 4.2.0)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.2.0)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/sebastiansaueruser/Rlibs
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────