Simulate p-hacking - adding observations

Let’s simulate p-values as a funtion of sample size. We assume that some researcher collects one data point, computes the p-value, and repeats until p-value falls below some arbitrary threshold. Oh and yes, there is no real effect. For the sake of spending the budget, assume that our researcher collects a sample size of \(n=100\).

This idea stems from this great article False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant; cf. Figure 2. However, source coude is not given and the right to reprint is confined (AFAIK).

library(tidyverse)

## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## Warning: package 'dplyr' was built under R version 3.5.1

## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Next we create data:

df <- data_frame(
  ID = 10:100
)

What we’ve done here is to simulate 100 samples, where each sample is the previous plus the next observation added (first observation is ID 1).

However, maybe it is simpler if we build up a list column data frame, then we can use this column for the t-test:

df %>% 
  mutate(value_vec = ID %>%  map(~rnorm(n = ., mean = 0, sd = 1))) -> df

Let’s glance at this column whethe it has worked out:

df$value_vec[1]

## [[1]]
##  [1] -0.31376518 -1.78247245  1.70888831 -0.96583484 -0.75445465
##  [6] -0.06194707  0.31427296  0.44219575 -0.06870127 -0.27294949

df$value_vec[2]

## [[1]]
##  [1]  0.5047487  0.9618283 -1.0044524  0.1146231  2.0096259  0.7156131
##  [7] -0.5452825 -1.6930971 -0.2824453  0.8167902  1.5829980

df$value_vec[3]

## [[1]]
##  [1] -1.0345525  0.9081553 -0.2113524  1.1032851  0.3984880 -0.3744192
##  [7] -0.9745205 -0.1131748 -1.7016539  1.0826774  0.4626363  2.0353619

OK. Now we can compute a statistic test, let’s take the t-test, test against zero, two-sided, \(\alpha = .05\).

df$value_vec %>% 
  map(~t.test(., mu = 0)) %>% 
  map("p.value") %>% 
  simplify -> p_values

df %>%
  mutate(p_values = as.numeric(p_values),
         sig = ifelse(p_values < .05, "sig", "ns")) -> df

Let’s have again a look at the df:

glimpse(df)

## Observations: 91
## Variables: 4
## $ ID        <int> 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ...
## $ value_vec <list> [<-0.31376518, -1.78247245, 1.70888831, -0.96583484...
## $ p_values  <dbl> 0.56502142, 0.40649211, 0.67844438, 0.21943144, 0.48...
## $ sig       <chr> "ns", "ns", "ns", "ns", "ns", "ns", "ns", "ns", "ns"...

Now, let’s plot:

df %>% 
  #filter(between(ID, 10, 60)) %>% 
  ggplot() + 
  aes(x = ID, y = p_values, color = sig) +
  geom_point() +
  geom_line(color = "grey20") +
  geom_hline(yintercept = .05, color = "red", linetype = "dashed") -> p1
  

#ggsave(p1, file = "simu_phacking.png", width = 8, height = 5)
ggplotly(p1)

As we can see, every little now and then some significant (p < .05) values emerge - even if there is absolutely no effect in the population. In this case, in the beginning (case 23 or so) some significant result popped out. A “clecer” investigator might decide to stop now - a lot of time and money saved. If he was suspicious about the low \(n\) he might wait until he arrives at a \(n\) of about 50 - lucky again. The only important thing is: check, check, check, after every observation compute your good old friend, the t-test p-value.

** This is p-hacking, p-hacking, and nothing but p-hacking.** Wrong, bogus results are bound to happen.