Mean of the upper half of a Gaussian

Load packages

library(tidyverse)
library(lsr)

Motivation

Recently, I listened to the great Paul Meehl in the audioscripts of some lectures of him. There, he asked the students

what’s the mean value of the upper half of a Gaussian distribution?

Let’s explore that using simulation techniques.

Simulation time

Let’s draw some instances from a standard Normal distribution, \(X\).

n <- 1e05
x <- rnorm(n)

Mean and SD in our sample are quite close to what can be expected:

mean(x)
#> [1] 0.002953451
sd(x)
#> [1] 0.9992983
aad(x)
#> [1] 0.7972097

Note that the aad is 0.8 in a standard normal distribution.

Upper half

Let’s put the data into a tibble.

df <- tibble(x = x,
             upper_half = ifelse(x >= 0, TRUE, FALSE))
df %>% 
  group_by(upper_half) %>% 
  summarise(avg_half = mean(x),
            sd_half = sd(x),
            var_half = var(x),
            md_half = median(x),
            iqr_half = IQR(x),
            aad_half = aad(x),
            n_half = n())
#> # A tibble: 2 x 8
#>   upper_half avg_half sd_half var_half md_half iqr_half aad_half n_half
#>   <lgl>         <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>  <int>
#> 1 FALSE        -0.797   0.604    0.365  -0.672    0.826    0.483  49822
#> 2 TRUE          0.797   0.601    0.361   0.676    0.831    0.482  50178

As can be seen the average in the halves is about 0.8 standard units away from the cut point (zero).

Some theory: Half normal distribution

The half-normal distribution has this expected value: \(E[Y]=\mu ={\frac {\sigma {\sqrt {2}}}{\sqrt {\pi }}}\).

From the Wikipedia page:

In more simple terms, we have this expected value:


E = 1 * sqrt(2) / sqrt(pi)
E
#> [1] 0.7978846

In other words, about 0.8 standard units, just as we saw above.

Note that 1 refers to the variance of \(X\) which is 1.

Regarding the variance, according to the Wikipedia page sited above, \((Y)=\sigma ^{2}\left(1-{\frac {2}{\pi }}\right)\)

Var = 1 * (1 - (2/pi))
Var
#> [1] 0.3633802

Note that this parameter is proportional to the Variance of \(X\).

This value does match nicely the simulated value from above.

However, according to Math Wolfram, \(V = \frac{-2 + \pi}{2 \theta^2}\).

Wikipedia states that Wolfram Math uses this parametrization:

\[\theta = \sqrt{\pi/2}\].

That is,

theta = sqrt(pi/2)
theta
#> [1] 1.253314
Var2 = (-2+pi) / (2 * theta^2)
Var2
#> [1] 0.3633802

Parameter estimation

According to Wikipedia, the \(\sigma\) can be estimated as such:

sigma_hat = sqrt((1/n) * sum(x^2))
sigma_hat
#> [1] 0.9992977