Load packages

library(tidyverse)
library(lsr)

Motivation

Recently, I listened to the great Paul Meehl in the audioscripts of some lectures of him. There, he asked the students

what’s the mean value of the upper half of a Gaussian distribution?

Let’s explore that using simulation techniques.

Simulation time

Let’s draw some instances from a standard Normal distribution, $X$ .

n <- 1e05
x <- rnorm(n)

Mean and SD in our sample are quite close to what can be expected:

mean(x)
#> [1] 0.002953451
sd(x)
#> [1] 0.9992983
aad(x)
#> [1] 0.7972097

Note that the aad is 0.8 in a standard normal distribution.

Upper half

Let’s put the data into a tibble.

df <- tibble(x = x,
             upper_half = ifelse(x >= 0, TRUE, FALSE))

df %>% 
  group_by(upper_half) %>% 
  summarise(avg_half = mean(x),
            sd_half = sd(x),
            var_half = var(x),
            md_half = median(x),
            iqr_half = IQR(x),
            aad_half = aad(x),
            n_half = n())
#> # A tibble: 2 x 8
#>   upper_half avg_half sd_half var_half md_half iqr_half aad_half n_half
#>   <lgl>         <dbl>   <dbl>    <dbl>   <dbl>    <dbl>    <dbl>  <int>
#> 1 FALSE        -0.797   0.604    0.365  -0.672    0.826    0.483  49822
#> 2 TRUE          0.797   0.601    0.361   0.676    0.831    0.482  50178

As can be seen the average in the halves is about 0.8 standard units away from the cut point (zero).

Some theory: Half normal distribution

The half-normal distribution has this expected value: $E [Y] = μ = \frac{σ \sqrt{2}}{\sqrt{π}}$ .

From the Wikipedia page:

In more simple terms, we have this expected value:


E = 1 * sqrt(2) / sqrt(pi)
E
#> [1] 0.7978846

In other words, about 0.8 standard units, just as we saw above.

Note that 1 refers to the variance of $X$ which is 1.

Regarding the variance, according to the Wikipedia page sited above, $(Y) = σ^{2} (1 - \frac{2}{π})$

Var = 1 * (1 - (2/pi))
Var
#> [1] 0.3633802

Note that this parameter is proportional to the Variance of $X$ .

This value does match nicely the simulated value from above.

However, according to Math Wolfram, $V = \frac{- 2 + π}{2 θ^{2}}$ .

Wikipedia states that Wolfram Math uses this parametrization:

$θ = \sqrt{π / 2}$ .

That is,

theta = sqrt(pi/2)
theta
#> [1] 1.253314

Var2 = (-2+pi) / (2 * theta^2)
Var2
#> [1] 0.3633802

Parameter estimation

According to Wikipedia, the $σ$ can be estimated as such:

sigma_hat = sqrt((1/n) * sum(x^2))
sigma_hat
#> [1] 0.9992977

Mean of the upper half of a Gaussian