Load packages
library(tidyverse)
The correlation coefficient cannot exceed an absolute value of 1
This is well-known. But why is that the case? How can we proof it? This post gives one explanation using the Cauchy-Schwarz inequality.
Here’s one version of the definition of correlation:
\[ r = \frac{\sum(\Delta x \Delta y)}{\sqrt{\sum \Delta x^2} \sqrt{\sum \Delta y^2}} \]
where \(\Delta x\) and \(\Delta y\) are the differences of \(x_i\) and \(\bar{x}\), that is: \(\Delta x_i = x_i - \bar{x}\), and similarly for \(\Delta y_i\).
For the ease of notation, let’s proceed with the understanding that \(x\) stands for the differences, ie \(\Delta x\) (and similarly for \(y\)):
\[ r = \frac{\sum(xy)}{\sqrt{\sum x^2} \sqrt{\sum y^2}} \]
Now, we conjecture that
\[ r = \frac{\sum(xy)}{\sqrt{\sum x^2} \sqrt{\sum y^2}} \le 1 \]
Let’s multiply the equation by the denominator of the LHS:
\[ \sum(xy) \le \sqrt{\sum x^2} \cdot \sqrt{\sum y^2} \]
The Cauchy Schwarz inequality states that
\[ \big| \langle x,y\rangle \big |\leq ||x|| \cdot ||y|| \]
In words, the inner product \(\langle x,y\rangle\) (in its positive variant, ie \(>0\)) is smaller or equal to the product of the vector norms.
Stated differently:
\[ \sum xy \le \sqrt{\sum x^2} \cdot \sqrt{\sum y^2} \]
Which is what we wanted to proof in the first place.
Here’s a quite nice intuition on the Cauchy Schwarz inequality.