【白板推导系列笔记】数学基础-概率-高斯分布-求边缘概率以及条件概率

26 阅读1分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

XN(μ,Σ)=1(2π)p2Σ12exp(12(xμ)TΣ1(xμ))xRp,r.v. \begin{gathered} X \sim N(\mu,\Sigma)=\frac{1}{(2\pi)^{\frac{p}{2}}|\Sigma|^{\frac{1}{2}}}\text{exp}\left(- \frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)\right)\\ x \in \mathbb{R}^{p},r.v.\\ \end{gathered}

已知

x=(xaxb),μ=(μaμb),Σ=(ΣaaΣabΣbaΣbb)xam×1,xbn×1,m+n=p \begin{gathered} x=\begin{pmatrix} x_{a} \\ x_{b} \end{pmatrix},\mu=\begin{pmatrix} \mu_{a} \\ \mu_{b} \end{pmatrix},\Sigma=\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}\\ x_{a}为m \times 1,x_{b}为 n \times 1,m+n=p \end{gathered}

P(xa),P(xbxa)P(x_{a}),P(x_{b}|x_{a}),求得后可以由对称性得到P(xb),P(xaxb)P(x_{b}),P(x_{a}|x_{b})

 

定理:

已知

XN(μ,Σ),xRp,y=Ax+B,yRpX \sim N(\mu,\Sigma),x \in \mathbb{R}^{p},y=Ax+B,y \in \mathbb{R}^{p}

则有

yN(Aμ+B,AΣAT)y \sim N(A \mu+B,A \Sigma A^{T})

 

先求xax_{a}的分布

xa=(ImOn)A(xaxb)xE(xa)=(ImO)(μaμb)=μaVar(xa)=(ImO)(ΣaaΣabΣbaΣbb)(ImO)=Σaa \begin{aligned} x_{a}&=\underbrace{\begin{pmatrix}I_{m} & O_{n}\end{pmatrix}}_{A}\underbrace{\begin{pmatrix} x_{a} \\ x_{b} \end{pmatrix}}_{x}\\ E(x_{a})&=\begin{pmatrix}I_{m} & O\end{pmatrix}\begin{pmatrix} \mu_{a} \\ \mu_{b} \end{pmatrix}=\mu_{a}\\ \text{Var}(x_{a})&=\begin{pmatrix} I_{m} & O \end{pmatrix}\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}\begin{pmatrix} I_{m} \\ O \end{pmatrix}=\Sigma_{aa} \end{aligned}

因此xaN(μa,Σaa)x_{a}\sim N(\mu_{a},\Sigma_{aa})

再求xbxax_{b}|x_{a}的分布,令

{xba=xbΣbaΣaa1xaμba=μbΣbaΣaa1μaΣbba=ΣbbΣbaΣaa1Σab \left\{\begin{aligned}&x_{b \cdot a}=x_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}\\&\mu_{b \cdot a}=\mu_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_{a}\\&\Sigma_{bb \cdot a}=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}\end{aligned}\right.

xba=(ΣbaΣaa1In)A(xaxb)xE(xba)=(ΣbaΣaa1In)(μaμb)=μbΣbaΣaa1μa=μbaVar(xba)=(ΣbaΣaa1In)(ΣaaΣabΣbaΣbb)(Σaa1ΣbaTIn)=ΣbbΣbaΣaa1Σab=Σbba \begin{aligned} x_{b \cdot a}&=\underbrace{\begin{pmatrix}- \Sigma_{ba}\Sigma_{aa}^{-1} & I_{n} \end{pmatrix}}_{A}\underbrace{\begin{pmatrix} x_{a} \\ x_{b} \end{pmatrix}}_{x}\\ E(x_{b \cdot a})&=\begin{pmatrix}- \Sigma_{ba}\Sigma_{aa}^{-1} & I_{n} \end{pmatrix}\begin{pmatrix} \mu_{a} \\ \mu_{b} \end{pmatrix}=\mu_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_{a}=\mu_{b \cdot a}\\ \text{Var}(x_{b \cdot a})&=\begin{pmatrix}- \Sigma_{ba}\Sigma_{aa}^{-1} & I_{n} \end{pmatrix}\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}\begin{pmatrix} -\Sigma_{aa}^{-1}\Sigma_{ba}^{T} \\ I_{n} \end{pmatrix}\\ &=\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab}=\Sigma_{bb \cdot a} \end{aligned}

因此xbaN(μba,Σbba)x_{b \cdot a}\sim N(\mu_{b \cdot a},\Sigma_{bb \cdot a})

这里要求xbxax_{b}|x_{a},即

xba=xbΣbaΣaa1xaxb=xba+ΣbaΣaa1xaxbxa=(xba+ΣbaΣaa1xa)xaxbxa=xbaxa+ΣbaΣaa1xaxaxbxa=xbaxa+ΣbaΣaa1xa \begin{aligned} x_{b \cdot a}&=x_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}\\ x_{b}&=x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}\\ x_{b}|x_{a}&=(x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a})|x_{a}\\ x_{b}|x_{a}&=x_{b \cdot a}|x_{a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}|x_{a}\\ x_{b}|x_{a}&=x_{b \cdot a}|x_{a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a} \end{aligned}

这里如果有xbaxa=xbax_{b \cdot a}|x_{a}=x_{b \cdot a},就可以有

xbxa=xba+ΣbaΣaa1xa x_{b}|x_{a}=x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}

 

XN(μ,Σ)X \sim N(\mu,\Sigma),则MxNxMΣNT=0Mx \bot Nx \Leftrightarrow M \Sigma N^{T}=0

证明:

因为xN(μ,Σ)x \sim N(\mu,\Sigma),有MxN(Mμ,MΣMT),NxN(Nμ,NΣNT)Mx \sim N(M \mu,M \Sigma M^{T}),Nx \sim N(N \mu,N \Sigma N^{T})

cov(Mx,Nx)=E[(MxMμ)(NxNμ)T]=ME[(xμ)(xμ)T]NT=MΣNT\begin{aligned} \text{cov}(Mx,Nx)&=E[(Mx-M \mu)(Nx-N \mu)^{T}]\\&=M \cdot E[(x-\mu)(x-\mu)^{T}]\cdot N^{T}\\&=M \Sigma N^{T}\end{aligned}

又因为MxNxMx \bot Nx且均为高斯分布,则有Cov(Mx,Nx)=MΣNT=0\text{Cov}(Mx ,Nx)=M \Sigma N^{T}=0

 

Σ=(ΣaaΣabΣbaΣbb)xba=xbΣbaΣaa1xa=(ΣbaΣaa1I)M(xaxb)xa=(IO)N(xaxb)MΣNT=(ΣbaΣaa1I)(ΣaaΣabΣbaΣbb)(IO)=0 \begin{aligned} \Sigma&=\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}\\ x_{b \cdot a}&=x_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}=\underbrace{\begin{pmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{pmatrix}}_{M}\begin{pmatrix} x_{a} \\ x_{b} \end{pmatrix}\\ x_{a}&=\underbrace{\begin{pmatrix} I & O \end{pmatrix}}_{N}\begin{pmatrix} x_{a} \\ x_{b} \end{pmatrix}\\ M \Sigma N^{T}&=\begin{pmatrix} -\Sigma_{ba}\Sigma_{aa}^{-1} & I \end{pmatrix}\begin{pmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{pmatrix}\begin{pmatrix} I & O \end{pmatrix}=0 \end{aligned}

因此xbaxaxbaxa=xbax_{b \cdot a}\bot x_{a}\Rightarrow x_{b \cdot a}|x_{a}=x_{b \cdot a },就有

xbxa=xbaxa+ΣbaΣaa1xa=xba+ΣbaΣaa1xa x_{b}|x_{a}=x_{b \cdot a}|x_{a}+\Sigma_{ba}\Sigma_{aa}^{-1}|x_{a}=x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}

因此

E(xbxa)=E(xba+ΣbaΣaa1xa)=μba+ΣbaΣaa1xa=μbΣbaΣaa1μa+ΣbaΣaa1xaVar(xbxa)=Var(xba+ΣbaΣaa1xa)=Var(xba)=Σbba \begin{aligned} E(x_{b}|x_{a})&=E(x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a})\\ &=\mu_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}\\ &=\mu_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_{a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a}\\ \text{Var}(x_{b}|x_{a})&=\text{Var}(x_{b \cdot a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a})\\ &=\text{Var}(x_{b \cdot a})\\ &=\Sigma_{bb \cdot a} \end{aligned}

因此xbxaN(μbΣbaΣaa1μa+ΣbaΣaa1xa,ΣbbΣbaΣaa1Σab)x_{b}|x_{a} \sim N(\mu_{b}-\Sigma_{ba}\Sigma_{aa}^{-1}\mu_{a}+\Sigma_{ba}\Sigma_{aa}^{-1}x_{a},\Sigma_{bb}-\Sigma_{ba}\Sigma_{aa}^{-1}\Sigma_{ab})