详解线性分类-线性判别分析(Fisher)-模型定义【白板推导系列笔记】

41 阅读1分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第1天,点击查看活动详情

线性判别分析的思想是,找的一个方向ω\omega,将样本向这个方向做投影,投影后的数据尽可能的满足

  1. 相同类内部的样本的投影尽可能接近

  2. 不同类之间的距离尽可能较大

 

总结为类内小,类间大

 

X=(x1x2 xN)T=(x1Tx2T xNT)N×p,Y=(y1y2 yN)N×1{(xi,yi)}i=1N,xiRp,yi{+1,1}xC1={xiyi=+1},xC2={xiyi=1}xC1=N1,xC2=N2,N1+N2=N \begin{gathered} X=\begin{pmatrix} x_{1} & x_{2} & \cdots  & x_{N} \end{pmatrix}^{T}=\begin{pmatrix} x_{1}^{T} \\ x_{2}^{T} \\ \vdots  \\ x_{N}^{T} \end{pmatrix}_{N \times p},Y=\begin{pmatrix} y_{1} \\ y_{2} \\ \vdots  \\ y_{N} \end{pmatrix}_{N \times 1}\\ \left\{(x_{i},y_{i})\right\}_{i=1}^{N},x_{i}\in \mathbb{R}^{p},y_{i}\in \left\{+1,-1\right\}\\ x_{C_{1}}=\left\{x_{i}|y_{i}=+1\right\},x_{C_{2}}=\left\{x_{i}|y_{i}=-1\right\}\\ |x_{C_{1}}|=N_{1},|x_{C_{2}}|=N_{2},N_{1}+N_{2}=N \end{gathered}

zi=ωTxi z_{i}=\omega^{T}x_{i}

显然这是个实数,可以看做xix_{i}ω\omega上的投影

模型要求类内小,可以用方差矩阵来衡量类内样本的聚散程度

zˉ=1Ni=1Nzi=1Ni=1NωTxiC1:z1ˉ=1N1i=1N1ωTxiS1=1N1i=1N1(ωTxiz1ˉ)(ωTxiz1ˉ)T=1N1i=1N1(ωTxi1N1j=1N1ωTxj)(ωTxi1N1j=1N1ωTxj)T这里定义1N1j=1N1xj=xC1=1N1i=1N1ωT(xixC1)(xixC1)Tω=ωT(1N1i=1N1(xixC1)(xixC1)T)ω这里定义1N1i=1N1(xixC1)(xixC1)T=SC1=ωTSC1ωC2:z2ˉ=1N2i=1N2ωTxiS2=ωTSC2ω \begin{aligned} \bar{z}&=\frac{1}{N}\sum\limits_{i=1}^{N}z_{i}=\frac{1}{N}\sum\limits_{i=1}^{N}\omega^{T}x_{i}\\ C_{1}:\bar{z_{1}}&=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}x_{i}\\ S_{1}&=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(\omega^{T}x_{i}- \bar{z_{1}})(\omega^{T}x_{i}-\bar{z_{1}})^{T}\\ &=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(\omega^{T}x_{i}- \frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}\omega^{T}x_{j})(\omega^{T}x_{i}- \frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}\omega^{T}x_{j})^{T}\\ &这里定义\frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}x_{j}=\overline{x_{C_{1}}}\\ &=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}\omega\\ &=\omega^{T}\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}\right)\omega\\ &这里定义\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}=S_{C_{1}}\\ &=\omega^{T}S_{C_{1}}\omega\\ C_{2}:\bar{z_{2}}&=\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}\omega^{T}x_{i}\\ S_{2}&=\omega^{T}S_{C_{2}}\omega \end{aligned}

因此类内可以用方差的和衡量,即

S1+S2=ωT(SC1+SC2)ω S_{1}+S_{2}=\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega

 

注意这里下标为1,21,2的是投影zz的相关数字特征,下表为C1,C2C_{1},C_{2}的是xx的相关数字特征

 

对于不同类之间的距离可以用不同类的均值差的平法来衡量,即

(z1ˉz2ˉ)2=(1N1i=1N1ωTxi1N2i=1N2ωTxi)2=ωT(1N1i=1N1xi1N2i=1N2xi)2=[ωT(xC1ˉxC2ˉ)]2=ωT(xC1ˉxC2ˉ)(xC1ˉxC2ˉ)Tω \begin{aligned} (\bar{z_{1}}-\bar{z_{2}})^{2}&=\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}x_{i}-\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}\omega^{T}x_{i}\right)^{2}\\ &=\omega^{T}\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}x_{i}-\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}x_{i}\right)^{2}\\ &=[\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})]^{2}\\ &=\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega \end{aligned}

要取最优的ω^\hat{\omega}就要求S1+S2S_{1}+S_{2}小,(z1ˉz2ˉ)2(\bar{z_{1}}-\bar{z_{2}})^{2}大,因此定义

J(ω)=ωT(xC1ˉxC2ˉ)(xC1ˉxC2ˉ)TωωT(SC1+SC2)ω \begin{aligned} J(\omega)&=\frac{\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega}{\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega} \end{aligned}

因此对于ω^\hat{\omega},有

ω^=argmax ωJ(ω)=argmax ωωT(xC1ˉxC2ˉ)(xC1ˉxC2ˉ)TωωT(SC1+SC2)ω定义Sb=(xC1ˉxC2ˉ)(xC1ˉxC2ˉ)T(betweenclass类间方差)定义Sω=SC1+SC2(withclass类内方差)=ωTSbωωTSωω=ωTSbω(ωTSωω)1J(ω)ω=2Sbω(ωTSωω)1+ωTSbω(1)(ωTSwω)22Sωω0=2Sbω(ωTSωω)1+ωTSbω(1)(ωTSwω)22Sωω0=Sbω(ωTSωω)ωTSbωSωω(ωTSbω)Sωω=Sbω(ωTSωω)这里显然ωTSbω,ωTSωωRω=ωTSωωωTSbωSω1Sbω这里如果只关系ω的方向,则可以忽略所有实数ωSω1SbωSω1(xC1ˉxC2ˉ)(xC1ˉxC2ˉ)Tω这里显然(xC1ˉxC2ˉ)Tω也是实数Sω1(xC1ˉxC2ˉ) \begin{aligned} \hat{\omega}&=\mathop{argmax\space}\limits_{\omega}J(\omega)\\ &=\mathop{argmax\space}\limits_{\omega}\frac{\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega}{\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega}\\ &定义S_{b}=(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}(between-class类间方差)\\ &定义S_{\omega}=S_{C_{1}}+S_{C_{2}}(with-class类内方差)\\ &=\frac{\omega^{T}S_{b}\omega}{\omega^{T}S_{\omega}\omega}\\ &=\omega^{T}S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}\\ \frac{\partial J(\omega)}{\partial \omega}&=2S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}+\omega^{T}S_{b}\omega \cdot (-1)(\omega^{T}S_{w} \omega)^{-2}\cdot 2S_{\omega}\omega\\ 0&=2S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}+\omega^{T}S_{b}\omega \cdot (-1)(\omega^{T}S_{w} \omega)^{-2}\cdot 2S_{\omega}\omega\\ 0&=S_{b}\omega(\omega^{T}S_{\omega}\omega)-\omega^{T}S_{b}\omega S_{\omega}\omega\\ (\omega^{T}S_{b}\omega) S_{\omega}\omega&=S_{b}\omega(\omega^{T}S_{\omega}\omega)\\ &这里显然\omega^{T}S_{b}\omega,\omega^{T}S_{\omega}\omega \in \mathbb{R}\\ \omega&=\frac{\omega^{T}S_{\omega}\omega}{\omega^{T}S_{b}\omega}S_{\omega}^{-1}S_{b}\omega\\ &这里如果只关系\omega的方向,则可以忽略所有实数\\ \omega &\propto S_{\omega}^{-1}S_{b}\omega\\ & \propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega\\ &这里显然(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega也是实数\\ &\propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}}) \end{aligned}

其实我们只是要argmax ωJ(ω)\mathop{argmax\space}\limits_{\omega}J(\omega),但实际上,我们只要ω\omega的方向,并不关系ω\omega的值,因此此处Sω1(xC1ˉxC2ˉ)\propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})记为所求的ω\omega方向