Load packages
library(tidyverse)
Motivation
Z-scores (z-values) are a useful and widely employed tool to gauge and compare measurements. For instance, z-scores help to compare the relative position of some measurements with respect to their distributions. In this post, we will prove some basic (algebraic) properties of z-values. There’s nothing new to that, it’s just I’d like to have it neat and concise somewhere to quickly find it. I’ll add some explanation for the ease of reception.
Definition of z-scores
A z-score measures the distance of a measurement to its (arithmetic) mean value. This distance is then set in relation to the “standard” distance of those measures to their mean.
That is,
z=xi−ˉxsxz=xi−¯xsx
where sxsx is the standard deviation of xx, and xixi is one measurement of the variable XX. ˉx¯x represents the average (mean) value.
Let’s call the distance from the mean dd, then the difference of measurement ii to the mean is
di=xi−ˉxdi=xi−¯x
It doesn’t really matter whether we subtract xixi from the mean or the other way round. The absolute value of didi remains the same; it’s just that measurement would appear as smaller or larger than the mean.
Now we express dd in terms of the standard difference (sd) of the measurements to its mean:
z=dsxz=dsx
Intuition
Z-scores have a mean of zero
By calculating zz we subtract the mean value, i.e. ˉx¯x from each measurement. As a consequence, the mean z-score will have a mean that is exactly ˉx¯x lower than the initial mean (ie., ˉx¯x). Subtracting from a value exactly the same value will yield zero.
Z-scores have a SD of 1
Each measurement is set in relation to the SD of XX, ie., we devide each didi by sxsx. In consequence, the sum of all measurements will be lower then the initial sum of x1,x2,⋯x1,x2,⋯ of exactly sxsx. Dividing a value by the same value yields 1. It that’s true for the sum, then it will also be true for the mean. As the SD is basically the mean of the ratio dsxdsx, the SD of ZZ will also equal 1.
Proofs
Z-scores have a mean of zero
In order to show that ˉx=0¯x=0 it suffices to show that ∑x=0∑x=0. That’s because if a sum equals zero, the mean must equal zero too.
∑z=∑xi−ˉXsx=1sx∑xi−ˉx=1sx∑xi−n⋅ˉx=1sx(n⋅ˉx−n⋅ˉx)=0
Note that the sum of alle measurements equals n-times the mean, i.e.,
∑xi=nˉx
This fact stems for the definition of the mean:
ˉx=1n∑ximultiply both sides by n
⇔nˉx=∑xi
Z-scores have a SD (and a variance) of 1
Remember that SD of some variable x is defined as:
sd(x)=1n∑((xi−ˉx)2)
If ˉx=0 – as it is the case for z-scores – this simplifies to
sd(x)=1n∑(xi)2
In order to show that the SD of Z equals 1 it suffices to show that the mean equals 1. That’s because if a sum equals 1, the mean must equal 1 too.
Note that the SD of Z is defines such that
s(Z)=1n∑z2i
In comparison, let sum(Z) be defines such that
sum(Z)=∑z2i
Hence, if sum(Z)=1 it follows that sd(Z)=1.
∑Z2i=∑(xi−ˉxsx)2=1s2x∑(xi−ˉx)2note that ns2x=∑(xi−ˉx)2
Hence
=1s2xns2x=n
We arrive at the fact the the sum of the squared z-scores equals n. Dividing by n (or n−1) yields 1.
In other words, the SD of Z equals 1.
Note that
sd(Z)=1⇔Var(Z)=1
where Var is the Variance. Whenever sd equals 1 the variance will equal 1, too.
We do not distinguish between the estimated population variance where we standardize be n−1 and the sample variance where we standardize by n. The principle remains the same, and the conclusions remain the same.
The average squared z-score equals 1
As discussed above the avarage squared z-score equals the SD of the z-score for a given variable x:
1n∑z2i=1