详解降维-样本均值&样本方差矩阵【白板推导系列笔记】持续创作，加速成长！这是我参与「掘金日新计划 · 10 月更文挑战」

持续创作，加速成长！这是我参与「掘金日新计划 · 10 月更文挑战」的第6天，点击查看活动详情

\begin{gathered} X=\begin{pmatrix} x_{1} & x_{2} & \cdots  & x_{N} \end{pmatrix}^{T}_{N \times p}=\begin{pmatrix} x_{1}^{T} \\ x_{2}^{T} \\ \vdots  \\ x_{N}^{T} \end{pmatrix}=\begin{pmatrix} x_{11} & x_{12} & \cdots &  x_{1p} \\ x_{21} & x_{22} & \cdots  & x_{2p} \\ \vdots  & \vdots  &  & \vdots  \\ x_{N1} & x_{N2} & \cdots  & x_{NP} \end{pmatrix}_{N \times p}\\ x_{i}\in \mathbb{R}^{p},i=1,2,\cdots ,N\\ 记1_{N}=\begin{pmatrix}1 \\ 1 \\ \vdots  \\ 1\end{pmatrix}_{N \times 1} \end{gathered}

对于样本均值

\begin{aligned} \bar{x}&=\frac{1}{N}\sum\limits_{i=1}^{N}x_{i}\\ &=\frac{1}{N}\begin{pmatrix} x_{1} & x_{2} & \cdots  & x_{N} \end{pmatrix}\begin{pmatrix}1 \\ 1 \\ \vdots  \\ 1\end{pmatrix}_{N \times 1}\\ &=\frac{1}{N}X^{T}1_{N} \end{aligned}

对于样本方差

\begin{aligned} S&=\frac{1}{N}\sum\limits_{i=1}^{N}(x_{i}-\bar{x})(x_{i}-\bar{x})^{T} \end{aligned}

对于 $\sum\limits_{i=1}^{N}(x_{i}-\bar{x})$ 有

\begin{aligned} \sum\limits_{i=1}^{N}(x_{i}-\bar{x})&=\begin{pmatrix} x_{1}-\bar{x} & x_{2}-\bar{x} & \cdots  & x_{N}-\bar{x} \end{pmatrix}\\ &=\begin{pmatrix} x_{1} & x_{2} & \cdots  & x_{N} \end{pmatrix}-\begin{pmatrix} \bar{x} & \bar{x} & \cdots  & \bar{x} \end{pmatrix}\\ &=X^{T}-\bar{x}\begin{pmatrix}1 & 1 & \cdots  & 1\end{pmatrix}\\ &=X^{T}-\bar{x}1_{N}^{T}\\ &=X^{T}- \frac{1}{N}X^{T}1_{N}1_{N}^{T}\\ &=X^{T}\left(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\right)\\ \end{aligned}

带回原式

\begin{aligned} S&=\frac{1}{N}\begin{pmatrix} x_{1}-\bar{x} & x_{2}-\bar{x} & \cdots  & x_{N}-\bar{x} \end{pmatrix}\begin{pmatrix} (x_{1}-\bar{x})^{T} \\ (x_{2}-\bar{x})^{T} \\ \vdots  \\ (x_{N}-\bar{x})^{T} \end{pmatrix}\\ &=\frac{1}{N}X^{T}\left(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\right)\cdot (\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T})^{T}X\\ \end{aligned}

记 $\begin{aligned} \mathbb{H}=\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\end{aligned}$ （ $\mathbb{H}$ 也被称为中心矩阵），上式为

\begin{aligned} S&=\frac{1}{N}X^{T}\left(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\right)\cdot (\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T})^{T}X\\ &=\frac{1}{N}X^{T}\mathbb{H}\cdot \mathbb{H}X \end{aligned}

对于 $\mathbb{H}^{T}$ 有

\begin{aligned} \mathbb{H}^{T}&=(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T})^{T}\\ &=\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\\ &=\mathbb{H} \end{aligned}

对于 $\mathbb{H}^{2}$ 有

\begin{aligned} \mathbb{H}^{2}&=\mathbb{H} \cdot \mathbb{H}\\ &=\left(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\right)\left(\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\right)\\ &=\mathbb{I}_{N}- \frac{2}{N}1_{N}1_{N}^{T}+ \frac{1}{N^{2}}1_{N}1_{N}^{T}1_{N}1_{N}^{T} \end{aligned}

对于 $1_{N}1_{N}^{T}$

\begin{aligned} 1_{N}1_{N}^{T}&=\begin{pmatrix} 1 \\ \vdots  \\ 1 \end{pmatrix}\begin{pmatrix} 1 & \cdots  & 1 \end{pmatrix}=\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}\\ 1_{N}1_{N}^{T}1_{N}1_{N}^{T}&=\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}\\ &=\begin{pmatrix} N & \cdots  & N \\ \vdots  &  & \vdots  \\ N & \cdots  & N \end{pmatrix} \end{aligned}

带回 $\mathbb{H}^{2}$ 有

\begin{aligned} \mathbb{H}^{2}&=\mathbb{I}_{N}- \frac{2}{N}1_{N}1_{N}^{T}+ \frac{1}{N^{2}}1_{N}1_{N}^{T}1_{N}1_{N}^{T}\\ &=\mathbb{I}_{N}- \frac{2}{N}\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}+ \frac{1}{N^{2}}\begin{pmatrix} N & \cdots  & N \\ \vdots  &  & \vdots  \\ N & \cdots  & N \end{pmatrix}\\ &=\mathbb{I}_{N}- \frac{2}{N}\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}+ \frac{1}{N}\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}\\ &=\mathbb{I}_{N}- \frac{1}{N}\begin{pmatrix} 1 & \cdots  & 1 \\ \vdots  &  & \vdots  \\ 1 & \cdots  & 1 \end{pmatrix}\\ &=\mathbb{I}_{N}- \frac{1}{N}1_{N}1_{N}^{T}\\ &=\mathbb{H} \end{aligned}

因此有 $\mathbb{H}^{n}=\mathbb{H}$ ，带回 $S$

\begin{aligned} S&=\frac{1}{N}X^{T}\mathbb{H}\cdot \mathbb{H}X\\ &=\frac{1}{N}X^{T}\mathbb{H}X \end{aligned}

这里中心矩阵 $\mathbb{H}$ 的几何意义是，对于一个数据集 $X$ ， $X \mathbb{H}$ 可以认为是将数据集平移到坐标轴原点， $\mathbb{H}$ 就是这个起到平移作用的矩阵