Load packages
library(tidyverse)
The K-Means optimization reduces the variance in each iteration. To illuminate on that Witten et al. in An Introduction to Statistical Learning (2013) present the following entity (p. 388, chap. 10):
\[\frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{ij} - x_{i^\prime j})^2 = 2 \sum\limits_{i \in C_k} \sum\limits_{j=1}^{p} (x_{ij} - \bar{x}_{kj})^2\]
A proof can be found here; I’ll add some explanations.
Note 1. Note that \(\sum\limits_{i,i^{\prime} \in C_k}(\dots)\) essentially amounts to \(\sum\limits_{i \in C_k}\sum\limits_{i^{\prime} \in C_k}(\dots)\), when the order of summation does not matter.
Define \(TSS\) (total sum of deviation squares) in the following way: \(TSS \equiv \frac{1}{|C_k|} \sum\limits_{i, \in C_k} \sum\limits_{j=1}^p (x_{ij} - x_{i^\prime j})^2\).
Define \(ISS\) (individual sum of deviation squares) in the following way: \(ISS \equiv (x_{ij} - x_{i^\prime j})^2\), for some \(i\).
Note 2. Note that the mean sum of deviation squares amounts to \(MSS \equiv \frac{1}{|C_k|} TSS\). Note further that \(TSS = |C_K| \cdot MSS\).
Add \(-\bar{x}_{kj}+ \bar{x}_{kj}\)
As \(-\bar{x}_{kj}+ \bar{x}_{kj} = 0\) we can safely add this term to the equation.
Expand the binomial formula
\[= \frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p ((x_{ij} - \bar{x}_{kj}) - (x_{i^\prime j} - \bar{x}_{kj}))^2\]
Expand the sum
We can make use of note 2 in order to get rid of one summation.
\[= \frac{1}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p ((x_{ij} - \bar{x}_{kj})^2 - 2 (x_{ij} - \bar{x}_{kj})(x_{i^\prime j} - \bar{x}_{kj}) + (x_{i^\prime j} - \bar{x}_{kj})^2)\]
Deviation from mean sum to to zero
Note that the third term sums up to zero for uncorrelated variables; this term is analogous to the covariance.
\[= \frac{|C_k|}{|C_k|} \sum\limits_{i \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})^2 + \frac{|C_k|}{|C_k|} \sum\limits_{i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{i^\prime j} - \bar{x}_{kj})^2 - \frac{2}{|C_k|} \sum\limits_{i,i^{\prime} \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})(x_{i^\prime j} - \bar{x}_{kj}) \\ = 2 \sum\limits_{i \in C_k} \sum\limits_{j=1}^p (x_{ij} - \bar{x}_{kj})^2 + 0 \]