Simulation sample and interval sizes for proportions

1 Exemplary Research question

What is the sample size needed to estimate the proportion of the event “high quality study” with an error margin of ±5%, and a confidence level of 99%? Let’s assume the probability of the event is 50%, and we draw the studies independently?

2 Task definition

Let’s simulate the width of a \(1-\alpha\) compatibility interval with a given margin of error \(E\) for varying samples sizes \(n_i\), where \(i = 1,...,N\), and \(N\) is the maximum sample size we can afford. We’ll assume indendently distributed and independent trials (iid).

Let’s further assume we are looking at a binary event \(X\) with a success \(x\) probability of \(Pr(X=x)=p\).

Let \(Y_k = \sum_1^i X_j\); think of \(Y_k\) as one sample of a given sample size, \(n_i\). Note that \(Y_k \overset{iid}{\sim} \mathcal{B}(n,p)\) (Binomially distributed).

Let’s draw \(K\) samples for each \(Y_k\), and dub the resulting statistic \(Z_i\). We’ll compute \(Z_i\) for each sample size \(n_i\), with \(i = 1,...,N\).

Finally, we’ll compute the arithmetic average of each \(Y_k\) and of \(Z_i\), for scale independency.

3 Technical setup

We’ll run this simulation in R, making use of the following packages:

library(tidyverse)
library(scales)
library(glue)
library(moments)

4 Define constants

N <- 1e04  # max. sample size
E <- .05  # error margin
p <- 1/2  # probability of event (success)
alpha = .01  # compatibility level
K <- 1e03  # repetitions per sample size

q <- abs(qnorm(p = alpha/2))
q  # quantile in the normal distribution for alpha
## [1] 2.575829

Note that \(p=1/2\) is the maximum conservative assumption yielding the largest sample size.

5 Prepare data frame for the simulation

df <- tibble(
  i = 1:N,
  estimate = NA,
  se = NA,
  lwr = NA,
  upr = NA,
  width = NA,
  small_enough = NA
)

6 Simulation

Let’s draw \(N\) samples with increasing sample size from 1 to \(N=10^{4}\), each time with \(K=1000\) repetitions, assuming independent draws and a Binomial distribution with \(p=0.5\).

for (i in 1:N) {
  # draw K "coin flip" samples of size i, 
  # with succcess propability p: 
  tmp <- rbinom(n = K, size = i, prob = p)
  # compute succcess rate for each sample:
  prop_success <- tmp/i
  # compute success rate over all K samples:
  df$estimate[i] <- mean(prop_success)
  # compute standard error:
  df$se[i] <- sd(prop_success)
  # compute interval borders and width:
  df$lwr[i] <- df$estimate[i] - q * df$se[i]
  df$upr[i] <- df$estimate[i] + q * df$se[i]
  df$width[i] <- df$upr[i] - df$lwr[i] 
  # check if intervals is as small as desired:
  if (df$width[i] < 2*E)
   {df$small_enough[i] <- TRUE} else {
     df$small_enough[i] <- FALSE}
}

7 Check some distributions

Note that the binomial distributions converges against the normal distribution. For instance, let’s pick \(n_i=100\) and \(K=10^4\).

smple <- tibble(
 estimate = rbinom(n = 1e04, 
                   size = 100, 
                   prob = p))

smple %>% 
  ggplot(aes(x = estimate)) +
  geom_density()

Let’s briefly check for normality, using a quantile-quantile plot:

ggplot(smple, aes(sample = estimate)) + 
  stat_qq() +
  stat_qq_line(col = "red")

This check reassured us that this distribution is normal.

Add some stats for checking normality:

skewness(smple$estimate)
## [1] -0.01740677
kurtosis(smple$estimate)
## [1] 2.952453

Which nicely fits a normal distribution.

Also note that a linear transformation of a normal curve is still normal:

tibble(
 estimate = rbinom(n = 1e04, 
                   size = 100, 
                   prob = p),
 prop = estimate / 100) %>% 
  ggplot(aes(x = prop)) +
  geom_density()

8 Minimum sample size

What’s the minimum sample size \(n_{min}\) needed such that \(w \le 2E\), where \(w\) denotes the width of the compatibility interval?

n_min <- 
  df$width %>% 
  detect_index(~ .x <= 2*E)

n_min
## [1] 357

What’s the maximum sample size \(n_{max}\) needed such that \(w > 2E\)?

n_max <- 
  df$width %>% 
  detect_index(~ .x > 2*E, .dir = "backward")

n_max
## [1] 419

9 Plot results

df %>% 
  filter(i > 30) %>% 
  ggplot(aes(x = i, y = width)) +
  geom_line() + 
  scale_x_continuous(trans = log10_trans()) + 
  geom_hline(yintercept = 2*E, linetype ="dashed", color = "grey60") +
  annotate("label", x = 100, y = .1, label = "2E") +
  geom_vline(xintercept = n_max, linetype = "dashed", 
             color = "blue") +
  geom_vline(xintercept = n_min, linetype = "dashed",
             color = "red") +
  annotate("label", x = n_max, y = .3, 
           label = glue("nmax={n_max}"), 
           color = "blue") +
  annotate("label", x = n_max, y = .2, 
           label = glue("nmin={n_min}"),
           color = "red") +
  labs(x = "sample size (log)",
       y = "margin of error",
       subtitle = "for estimating the width of compatibility intervals",
       title = "Simulating minimum sample sizes",
       caption = glue("Alpha: {alpha}, desired E = {E}, p = 1/2. See text for details.\n Each x-value depicts the average of {K} samples")) +
  theme_minimal()

10 Summary

The simulation implies that we need a sample size \(n\) of approx. around 357 to 419 to get a compatibility interval of width \(2E\) for an alpha level of \(1-\alpha=0.99\).

11 Discussion

This post proposes a rough approach of attaining practical sample sizes when an analytical approach is not wanted or feasible. Several limitations should be put forward: One might want with finite populations; one might want to compare the simulation results with typical formulas; one might prefer different stopping rules.

12 Suggested reading

Check out the following sources for related issues and more in-depth treatment: Landau and Stahl (2013), Schönbrodt and Perugini (2013), Maxwell, Kelley, and Rausch (2008), Arnold et al. (2011), Kruschke (2015).

Bibliography

Arnold, Benjamin F., Daniel R. Hogan, John M. Colford, and Alan E. Hubbard. 2011. “Simulation Methods to Estimate Design Power: An Overview for Applied Research.” BMC Medical Research Methodology 11 (1): 94. https://doi.org/10.1186/1471-2288-11-94.
Kruschke, John K. 2015. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Edition 2. Boston: Academic Press.
Landau, Sabine, and Daniel Stahl. 2013. “Sample Size and Power Calculations for Medical Studies by Simulation When Closed Form Expressions Are Not Available.” Statistical Methods in Medical Research 22 (3): 324–45. https://doi.org/10.1177/0962280212439578.
Maxwell, Scott E., Ken Kelley, and Joseph R. Rausch. 2008. “Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation.” Annual Review of Psychology 59 (1): 537–63. https://doi.org/10.1146/annurev.psych.59.103006.093735.
Schönbrodt, Felix D., and Marco Perugini. 2013. “At What Sample Size Do Correlations Stabilize?” Journal of Research in Personality 47 (5): 609–12. https://doi.org/10.1016/j.jrp.2013.05.009.