详解线性分类-高斯判别分析(Gaussian Discriminant Analysis)-模型定义【白板推导系列笔记】

317 阅读1分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第3天,点击查看活动详情

{(xi,yi)}i=1N,xiRp,yi{0,1} \begin{gathered} \left\{(x_{i},y_{i})\right\}_{i=1}^{N},x_{i}\in \mathbb{R}^{p},y_{i} \in \left\{0,1\right\} \end{gathered}

逻辑回归是直接对p(yx)p(y|x)建模,而高斯判别分析作为概率生成模型,是通过引入类型的先验,通过贝叶斯公式,得到联合分布p(x,y)=p(xy)p(y)p(x,y)=p(x|y)p(y),再对联合分布的对数似然得到参数

 

贝叶斯公式为

p(yx)=p(xy)p(y)p(x)p(y|x)=\frac{p(x|y)p(y)}{p(x)}

 

但是由于我们只关心p(y=1x)=p(xy=1)p(y=1)p(x)\begin{aligned} p(y=1|x)=\frac{p(x|y=1)p(y=1)}{p(x)}\end{aligned}p(y=0x)=p(xy=0)p(y=0)p(x)\begin{aligned} p(y=0|x)=\frac{p(x|y=0)p(y=0)}{p(x)}\end{aligned}的大小关系,因此不需要关注分母,因为二者是一样的,即

y^=argmax y{0,1}p(yx)由于p(yx)p(xy)p(y)=argmax yp(y)p(xy) \begin{aligned} \hat{y}&=\mathop{argmax\space}\limits_{y \in \left\{0,1\right\}}p(y|x)\\ &由于p(y|x)\propto p(x|y)p(y)\\ &=\mathop{argmax\space}\limits_{y}p(y)\cdot p(x|y) \end{aligned}

高斯判别分析我们对数据集作出的假设有,类的先验是二项分布,每一类的似然是高斯分布,即

yB(1,ϕ)p(y)={ϕyy=1(1ϕ)1yy=0p(y)=ϕy(1ϕ)1yxy=1N(μ1,Σ)xy=0N(μ2,Σ)p(xy)=N(μ1,Σ)yN(μ2,Σ)1y \begin{aligned} y & \sim B(1,\phi)\Rightarrow p(y)=\left\{\begin{aligned}&\phi^{y}&y=1\\&(1-\phi)^{1-y}&y=0\end{aligned}\right.\\ &\Rightarrow p(y)=\phi^{y}(1-\phi)^{1-y}\\ x|y=1 &\sim N(\mu_{1},\Sigma)\\ x|y=0 & \sim N(\mu_{2},\Sigma) \\ &\Rightarrow p(x|y)=N(\mu_{1},\Sigma)^{y}\cdot N(\mu_{2},\Sigma)^{1-y} \end{aligned}

因此,最大后验

L(μ1,μ2,Σ,ϕ)=logi=1N[p(xiyi)p(yi)]=i=1N[logp(xiyi)+logp(yi)]=i=1N[logN(μ1,Σ)yi+logN(μ2,Σ)1yi+logϕyi(1ϕ)1yi] \begin{aligned} L(\mu_{1},\mu_{2},\Sigma,\phi)&=\log \prod\limits_{i=1}^{N}[p(x_{i}|y_{i})p(y_{i})]\\ &=\sum\limits_{i=1}^{N}[\log p(x_{i}|y_{i})+\log p(y_{i})]\\ &=\sum\limits_{i=1}^{N}[\log N(\mu_{1},\Sigma)^{y_{i}}+\log N(\mu_{2},\Sigma)^{1-y_{i}}+\log \phi^{y_{i}}(1-\phi)^{1-y_{i}}] \end{aligned}

![[附件/Pasted image 20220928160422.png|400]]

作者:张文翔 链接:Andrew Ng Stanford机器学习公开课 总结(5) - 张文翔的博客 | BY ZhangWenxiang (demmon-tju.github.io)