Load packages
library(tidyverse)
library(lsr)
Motivation
Recently, I listened to the great Paul Meehl in the audioscripts of some lectures of him. There, he asked the students
what’s the mean value of the upper half of a Gaussian distribution?
Let’s explore that using simulation techniques.
Simulation time
Let’s draw some instances from a standard Normal distribution, \(X\).
n <- 1e05
x <- rnorm(n)
Mean and SD in our sample are quite close to what can be expected:
mean(x)
#> [1] 0.002953451
sd(x)
#> [1] 0.9992983
aad(x)
#> [1] 0.7972097
Note that the aad is 0.8 in a standard normal distribution.
Upper half
Let’s put the data into a tibble.
df <- tibble(x = x,
upper_half = ifelse(x >= 0, TRUE, FALSE))
df %>%
group_by(upper_half) %>%
summarise(avg_half = mean(x),
sd_half = sd(x),
var_half = var(x),
md_half = median(x),
iqr_half = IQR(x),
aad_half = aad(x),
n_half = n())
#> # A tibble: 2 x 8
#> upper_half avg_half sd_half var_half md_half iqr_half aad_half n_half
#> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 FALSE -0.797 0.604 0.365 -0.672 0.826 0.483 49822
#> 2 TRUE 0.797 0.601 0.361 0.676 0.831 0.482 50178
As can be seen the average in the halves is about 0.8 standard units away from the cut point (zero).
Some theory: Half normal distribution
The half-normal distribution has this expected value: \(E[Y]=\mu ={\frac {\sigma {\sqrt {2}}}{\sqrt {\pi }}}\).
From the Wikipedia page:
In more simple terms, we have this expected value:
E = 1 * sqrt(2) / sqrt(pi)
E
#> [1] 0.7978846
In other words, about 0.8 standard units, just as we saw above.
Note that 1
refers to the variance of \(X\) which is 1.
Regarding the variance, according to the Wikipedia page sited above, \((Y)=\sigma ^{2}\left(1-{\frac {2}{\pi }}\right)\)
Var = 1 * (1 - (2/pi))
Var
#> [1] 0.3633802
Note that this parameter is proportional to the Variance of \(X\).
This value does match nicely the simulated value from above.
However, according to Math Wolfram, \(V = \frac{-2 + \pi}{2 \theta^2}\).
Wikipedia states that Wolfram Math uses this parametrization:
\[\theta = \sqrt{\pi/2}\].
That is,
theta = sqrt(pi/2)
theta
#> [1] 1.253314
Var2 = (-2+pi) / (2 * theta^2)
Var2
#> [1] 0.3633802
Parameter estimation
According to Wikipedia, the \(\sigma\) can be estimated as such:
sigma_hat = sqrt((1/n) * sum(x^2))
sigma_hat
#> [1] 0.9992977