Some algebraic properties of z-scores

Load packages

library(tidyverse)

Motivation

Z-scores (z-values) are a useful and widely employed tool to gauge and compare measurements. For instance, z-scores help to compare the relative position of some measurements with respect to their distributions. In this post, we will prove some basic (algebraic) properties of z-values. There’s nothing new to that, it’s just I’d like to have it neat and concise somewhere to quickly find it. I’ll add some explanation for the ease of reception.

Definition of z-scores

A z-score measures the distance of a measurement to its (arithmetic) mean value. This distance is then set in relation to the “standard” distance of those measures to their mean.

That is,

z=xiˉxsxz=xi¯xsx

where sxsx is the standard deviation of xx, and xixi is one measurement of the variable XX. ˉx¯x represents the average (mean) value.

Let’s call the distance from the mean dd, then the difference of measurement ii to the mean is

di=xiˉxdi=xi¯x

It doesn’t really matter whether we subtract xixi from the mean or the other way round. The absolute value of didi remains the same; it’s just that measurement would appear as smaller or larger than the mean.

Now we express dd in terms of the standard difference (sd) of the measurements to its mean:

z=dsxz=dsx

Intuition

Z-scores have a mean of zero

By calculating zz we subtract the mean value, i.e. ˉx¯x from each measurement. As a consequence, the mean z-score will have a mean that is exactly ˉx¯x lower than the initial mean (ie., ˉx¯x). Subtracting from a value exactly the same value will yield zero.

Z-scores have a SD of 1

Each measurement is set in relation to the SD of XX, ie., we devide each didi by sxsx. In consequence, the sum of all measurements will be lower then the initial sum of x1,x2,x1,x2, of exactly sxsx. Dividing a value by the same value yields 1. It that’s true for the sum, then it will also be true for the mean. As the SD is basically the mean of the ratio dsxdsx, the SD of ZZ will also equal 1.

Proofs

Z-scores have a mean of zero

In order to show that ˉx=0¯x=0 it suffices to show that x=0x=0. That’s because if a sum equals zero, the mean must equal zero too.

z=xiˉXsx=1sxxiˉx=1sxxinˉx=1sx(nˉxnˉx)=0

Note that the sum of alle measurements equals n-times the mean, i.e.,

xi=nˉx

This fact stems for the definition of the mean:

ˉx=1nximultiply both sides by n

nˉx=xi

Z-scores have a SD (and a variance) of 1

Remember that SD of some variable x is defined as:

sd(x)=1n((xiˉx)2)

If ˉx=0 – as it is the case for z-scores – this simplifies to

sd(x)=1n(xi)2

In order to show that the SD of Z equals 1 it suffices to show that the mean equals 1. That’s because if a sum equals 1, the mean must equal 1 too.

Note that the SD of Z is defines such that

s(Z)=1nz2i

In comparison, let sum(Z) be defines such that

sum(Z)=z2i

Hence, if sum(Z)=1 it follows that sd(Z)=1.

Z2i=(xiˉxsx)2=1s2x(xiˉx)2note that ns2x=(xiˉx)2

Hence

=1s2xns2x=n

We arrive at the fact the the sum of the squared z-scores equals n. Dividing by n (or n1) yields 1.

In other words, the SD of Z equals 1.

Note that

sd(Z)=1Var(Z)=1

where Var is the Variance. Whenever sd equals 1 the variance will equal 1, too.

We do not distinguish between the estimated population variance where we standardize be n1 and the sample variance where we standardize by n. The principle remains the same, and the conclusions remain the same.

The average squared z-score equals 1

As discussed above the avarage squared z-score equals the SD of the z-score for a given variable x:

1nz2i=1