# Mean of the upper half of a Gaussian

library(tidyverse)
library(lsr)

# Motivation

Recently, I listened to the great Paul Meehl in the audioscripts of some lectures of him. There, he asked the students

what’s the mean value of the upper half of a Gaussian distribution?

Let’s explore that using simulation techniques.

# Simulation time

Let’s draw some instances from a standard Normal distribution, $$X$$.

n <- 1e05
x <- rnorm(n)

Mean and SD in our sample are quite close to what can be expected:

mean(x)
#>  0.002953451
sd(x)
#>  0.9992983
#>  0.7972097

Note that the aad is 0.8 in a standard normal distribution.

# Upper half

Let’s put the data into a tibble.

df <- tibble(x = x,
upper_half = ifelse(x >= 0, TRUE, FALSE))
df %>%
group_by(upper_half) %>%
summarise(avg_half = mean(x),
sd_half = sd(x),
var_half = var(x),
md_half = median(x),
iqr_half = IQR(x),
n_half = n())
#> # A tibble: 2 x 8
#>   upper_half avg_half sd_half var_half md_half iqr_half aad_half n_half
#>   <lgl>         <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>  <int>
#> 1 FALSE        -0.797   0.604    0.365  -0.672    0.826    0.483  49822
#> 2 TRUE          0.797   0.601    0.361   0.676    0.831    0.482  50178

As can be seen the average in the halves is about 0.8 standard units away from the cut point (zero).

# Some theory: Half normal distribution

The half-normal distribution has this expected value: $$E[Y]=\mu ={\frac {\sigma {\sqrt {2}}}{\sqrt {\pi }}}$$. In more simple terms, we have this expected value:


E = 1 * sqrt(2) / sqrt(pi)
E
#>  0.7978846

In other words, about 0.8 standard units, just as we saw above.

Note that 1 refers to the variance of $$X$$ which is 1.

Regarding the variance, according to the Wikipedia page sited above, $$(Y)=\sigma ^{2}\left(1-{\frac {2}{\pi }}\right)$$

Var = 1 * (1 - (2/pi))
Var
#>  0.3633802

Note that this parameter is proportional to the Variance of $$X$$.

This value does match nicely the simulated value from above.

However, according to Math Wolfram, $$V = \frac{-2 + \pi}{2 \theta^2}$$.

Wikipedia states that Wolfram Math uses this parametrization:

$\theta = \sqrt{\pi/2}$.

That is,

theta = sqrt(pi/2)
theta
#>  1.253314
Var2 = (-2+pi) / (2 * theta^2)
Var2
#>  0.3633802

# Parameter estimation

According to Wikipedia, the $$\sigma$$ can be estimated as such:

sigma_hat = sqrt((1/n) * sum(x^2))
sigma_hat
#>  0.9992977