# Load packages

```
library(tidyverse)
library(mosaic)
```

# Motivation

The p-value is a ubiquituous tool for gauging the plausibility of a Null hypothesis. More specifically, the p-values indicates the probability of obtaining a test statistic at least as extreme as in the present data if the Null hypothesis was true and the experiment would be repeated an infinite number of times (under the same conditions except the data generating process).

The distribution of the p-values depends on the strength of some effect (among other things). The bigger the effect, the smaller the p-values (ceteris paribus). In the extreme, if the effect is zero, that is, if the Null is true, the distribution will be equally spread out. As this sounds not overly intuitive maybe, let’s simulate it.

There are of course nice proofs out there, this one is quite straigt forward for example.

# Simulate p-values for the Null

Let our experiment consist of a t-test with two groups of n=30, and the Null being true. Further, let the distribution be standard normal. We’ll test two-sided, and take an alpha of .05 for granted.

Let’s simply extract the p-value:

```
get_p_from_ttest_sim <- function(n = 30, my_m = 0, my_sd = 1) {
sample1 <- rnorm(n = n, mean = my_m, sd = my_sd)
sample2 <- rnorm(n = n, mean = my_m, sd = my_sd)
p <- t.test(x = sample1, y = sample2)$p.value
p <- list(p_value = p)
return( p)
}
```

Test it:

```
get_p_from_ttest_sim()
#> $p_value
#> [1] 0.9270056
```

Now rerun this simulated experiment a couple of times:

```
k <- 1e05
ps <- do(k) * get_p_from_ttest_sim()
```

# Plot it

`gf_histogram(~p_value, data = ps, bins = 10)`

# Closer look

Let’s compute the proportion of significant p-values, ie p<.05. That should sum up to approx. 5%.

```
ps %>%
filter(p_value < .05) %>%
summarise(p_prop = n()/k)
#> p_prop
#> 1 0.0491
```

And now let’s cut the range in 10 bins:

```
ps %>%
mutate(bins = cut(p_value, breaks = 11)) %>%
group_by(bins) %>%
summarise(p_prop = n())
#> # A tibble: 11 x 2
#> bins p_prop
#> <fct> <int>
#> 1 (-0.00098,0.0909] 8989
#> 2 (0.0909,0.182] 9028
#> 3 (0.182,0.273] 9225
#> 4 (0.273,0.364] 9247
#> 5 (0.364,0.455] 9143
#> 6 (0.455,0.545] 9137
#> 7 (0.545,0.636] 9192
#> 8 (0.636,0.727] 8948
#> 9 (0.727,0.818] 8918
#> 10 (0.818,0.909] 9121
#> 11 (0.909,1] 9052
```

Approximately identical group sizes.

Finally, let’s check the lower p-values to see if they are also uniformly distributed. Note that we reduce the sample size greatly by filtering on a small part of p-values so we should expect more variation around the expected value.

```
ps %>%
filter(p_value <= .05) %>%
mutate(bins =cut(p_value, breaks = 11)) %>%
group_by(bins) %>%
summarise(p_prop = n())
#> # A tibble: 11 x 2
#> bins p_prop
#> <fct> <int>
#> 1 (-3.02e-05,0.00456] 427
#> 2 (0.00456,0.00911] 450
#> 3 (0.00911,0.0136] 457
#> 4 (0.0136,0.0182] 465
#> 5 (0.0182,0.0227] 456
#> 6 (0.0227,0.0273] 462
#> 7 (0.0273,0.0318] 446
#> 8 (0.0318,0.0364] 432
#> 9 (0.0364,0.0409] 446
#> 10 (0.0409,0.0454] 432
#> 11 (0.0454,0.05] 437
```

As said there appears to be quite some variation within these bins.

And even more close-up:

```
ps %>%
filter(p_value <= .005) %>%
mutate(bins =cut(p_value, breaks = 11)) %>%
group_by(bins) %>%
summarise(p_prop = n())
#> # A tibble: 11 x 2
#> bins p_prop
#> <fct> <int>
#> 1 (1.48e-05,0.000471] 42
#> 2 (0.000471,0.000922] 41
#> 3 (0.000922,0.00137] 44
#> 4 (0.00137,0.00183] 40
#> 5 (0.00183,0.00228] 45
#> 6 (0.00228,0.00273] 38
#> 7 (0.00273,0.00318] 37
#> 8 (0.00318,0.00363] 48
#> 9 (0.00363,0.00408] 52
#> 10 (0.00408,0.00453] 39
#> 11 (0.00453,0.00499] 44
```