Load packages
library(tidyverse)
Motivation
Z-scores (z-values) are a useful and widely employed tool to gauge and compare measurements. For instance, z-scores help to compare the relative position of some measurements with respect to their distributions. In this post, we will prove some basic (algebraic) properties of z-values. There’s nothing new to that, it’s just I’d like to have it neat and concise somewhere to quickly find it. I’ll add some explanation for the ease of reception.
Definition of z-scores
A z-score measures the distance of a measurement to its (arithmetic) mean value. This distance is then set in relation to the “standard” distance of those measures to their mean.
That is,
where is the standard deviation of , and is one measurement of the variable . represents the average (mean) value.
Let’s call the distance from the mean , then the difference of measurement to the mean is
It doesn’t really matter whether we subtract from the mean or the other way round. The absolute value of remains the same; it’s just that measurement would appear as smaller or larger than the mean.
Now we express in terms of the standard difference (sd) of the measurements to its mean:
Intuition
Z-scores have a mean of zero
By calculating we subtract the mean value, i.e. from each measurement. As a consequence, the mean z-score will have a mean that is exactly lower than the initial mean (ie., ). Subtracting from a value exactly the same value will yield zero.
Z-scores have a SD of 1
Each measurement is set in relation to the SD of , ie., we devide each by . In consequence, the sum of all measurements will be lower then the initial sum of of exactly . Dividing a value by the same value yields 1. It that’s true for the sum, then it will also be true for the mean. As the SD is basically the mean of the ratio , the SD of will also equal 1.
Proofs
Z-scores have a mean of zero
In order to show that it suffices to show that . That’s because if a sum equals zero, the mean must equal zero too.
Note that the sum of alle measurements equals -times the mean, i.e.,
This fact stems for the definition of the mean:
Z-scores have a SD (and a variance) of 1
Remember that SD of some variable is defined as:
If – as it is the case for z-scores – this simplifies to
In order to show that the SD of equals 1 it suffices to show that the mean equals 1. That’s because if a sum equals 1, the mean must equal 1 too.
Note that the SD of Z is defines such that
In comparison, let be defines such that
Hence, if it follows that .
Hence
We arrive at the fact the the sum of the squared z-scores equals . Dividing by (or ) yields 1.
In other words, the SD of equals 1.
Note that
where is the Variance. Whenever sd equals 1 the variance will equal 1, too.
We do not distinguish between the estimated population variance where we standardize be and the sample variance where we standardize by . The principle remains the same, and the conclusions remain the same.
The average squared z-score equals 1
As discussed above the avarage squared z-score equals the SD of the z-score for a given variable :