# MLAPP 读书笔记 - 05 贝叶斯统计

A Chinese Notes of MLAPP，MLAPP 中文笔记项目 zhuanlan.zhihu.com/python-kivy

2018年05月29日12:22:15

## 5.2 总结后验分布

### 5.2.1 最大后验估计(MAP estimation)

#### 5.2.1.4 最大后验估计(MAP estimation)对重参数化(reparameterization)是可变的(not invariant)*

$p_ \theta (\theta)= p_\mu (\mu)|\frac{d\mu}{d\theta}|$(5.3)

$\hat\theta_{MAP} = \arg\max_{\theta\in[0,1]}2\theta =1$(5.4)

$p_\phi(\phi)=p_\mu(\mu)|\frac{d\mu}{d\theta}|=2(1-\phi)$(5.5)

$\hat\theta_{MAP} = \arg\max_{\theta\in[0,1]}2-2\phi =0$(5.6)

$\hat\theta =\arg\max_\theta p(D|\theta) p(\theta)|I\theta|^{-\frac{1}{2}}$(5.7)

### 5.2.2 置信区间

$C_\alpha (D)=(l,u):P(l\le \theta\le u|D)=1-alpha$(5.8)

#### 5.2.2.1 最高后验密度区(Highest posterior density regions)*

$1-\alpha =\int_{\theta:p(\theta|D)>p^*}p(\theta|D)d\theta$(5.9)

### 5.2.3 比例差别的推导(Inference for a difference in proportions)

$p(\delta>0|D) =\int^1_0\int^1_0 I(\theta_1>\theta_2)Beta(\theta_1|y_1+1,N_1-y_1+1)Beta(\theta_2|y_2+1,N_2-y_2+1)d\theta_1 d\theta_2$(5.11)

## 5.3 贝叶斯模型选择

$p(m|D)=\frac{p(D|m)p(m)}{\sum_{m\in M}}p(m,D)$(5.12)

$p(D|m)=\int p(D|\theta)p(\theta|m)d\theta$(5.13)

### 5.3.2 计算边缘似然率(证据)

#### 5.3.2.1 $\beta$-二项模型(Beta-binomial model)

? \begin{aligned} p(\theta|D)&= \frac{p(D|\theta)p(\theta)}{p(D)}&\text{(5.19)}\ &= \frac{1}{p(D)}[\frac{1}{B(a,b)}\theta^{a-1}(1-\theta)^{b-1}][\begin{pmatrix}N\N_1\end{pmatrix}\theta^{N_1}(1-\theta)^{N_0}] &\text{(5.20)}\ &= \begin{pmatrix}N\N_1\end{pmatrix}\frac{1}{p(D)}\frac{1}{B(a,b)}[\theta^{a+N_1-1}(1-\theta)^{b+N_0-1}]&\text{(5.21)}\ \end{aligned} ?

? \begin{aligned} \frac{1}{B(a+N_1,b+N_0)}&= \begin{pmatrix}N\N_1\end{pmatrix}\frac{1}{p(D)}\frac{1}{B(a,b)} &\text{(5.22)}\ p(D)&= \begin{pmatrix}N\N_1\end{pmatrix}\frac{B(a+N_1,b+N_0)}{B(a,b)} &\text{(5.23)}\ \end{aligned} ?

$\beta$-伯努利分布模型(Beta-Bernoulli model)的边缘似然函数和上面的基本一样,唯一区别就是去掉了$\begin{pmatrix}N\N_1\end{pmatrix}$这一项.

#### 5.3.2.3 高斯-高斯-威沙特分布(Gaussian-Gaussian-Wishart model)

? \begin{aligned} p(D)&= \frac{Z_N}{Z_0Z_1} &\text{(5.27)}\ &= \frac{1}{\pi^{ND/2}}\frac{1}{2^{ND/2}}\frac{ (\frac{2\pi}{k_N})^{D/2} |S_N|^{-v_N/2}2^{(v_0+N)D/2}\Gamma_D(v_N/2) }{ (\frac{2\pi}{k_0})^{D/2} |S_0|^{-v_0/2}2^{v_0D/2}\Gamma_D(v_0/2) } &\text{(5.28)}\ &= \frac{1}{\pi^{ND/2}}( \frac{k_0}{k_N} )^{D/2} \frac{|S_0|^{-v_0/2}\Gamma_D(v_N/2) }{|S_N|^{-v_N/2}\Gamma_D(v_0/2)} &\text{(5.29)}\ \end{aligned} ?

#### 5.3.2.4 对数边缘似然函数的贝叶斯信息标准估计(BIC approximation to log marginal likelihood)

$BIC*=\log p(D|\hat \theta) -\frac{dof(\hat \theta)}{2}\log N\approx \log p(D)$(5.30)

$\log p(D|\hat\theta)=-\frac{N}{2}\log(2\pi\hat\sigma^2)-\frac{N}{2}$(5.31)

#### 5.3.2.5 先验的效果

$p(D|m)=\int\int p(D|w)p(w|\alpha,m)p(\alpha|m)dwd\alpha$(5.36)

$p(D|m)\approx \int p(D|w)p(w|\hat\alpha,m)dw$(5.37)

$\hat\alpha=\arg\max_{\alpha} p(D|\alpha,m)=\arg\max_{\alpha}\int p(D|w)p(w|\alpha,m)dw$(5.38)

BF<1/100 $M_0$决定性证据
BF<1/10 $M_0$强证据
1/10<BF<1/3 $M_0$中等证据
1/3<BF<1 $M_0$弱证据
1<BF<3 $M_1$弱证据
3<BF<10 $M_1$中等证据
BF>10 $M_1$强证据
BF>100 $M_1$决定性证据

### 5.3.3 贝叶斯因数(Bayes factors)

$BF_{1,0}*=\frac{ p(D|M_1)}{p(D|M_0)}=\frac{p(M_1|D)}{p(M_0|D)}/\frac{p(M_1)}{p(M_0)}$(5.39)

(这个跟似然率比值(likelihood ratio)很相似,区别就是整合进来了参数,这就可以对不同复杂度的模型进行对比了.)如果$BF_{1,0}>1$,我们就优先选择模型1,繁殖就选择模型0.

$p(M_0|D)=\frac{BF_{0,1}}{1+BF_{0,1}}=\frac{1}{BF_{1,0}+1}$(5.40)

#### 5.3.3.1 样例:测试硬币是否可靠

$p(D|M_)=(\frac{1}{2})^N$(5.41)

$p(D|M_1)=\int p(D|\theta)p(\theta)d\theta=\frac{B(\alpha_1+N_1,\alpha_0+N_0)}{B(\alpha_1,\alpha_0)}$(5.42)

### 5.3.4 杰弗里斯 - 林德利悖论(Jeffreys-Lindley paradox)*

$p(\theta)=p(\theta|M_0)p(M_0)+p(\theta|M_1)p(M_1)$(5.43)

? \begin{aligned} p(M_0|D)& = \frac{p(M_0)p(D|M_0)}{p(M_0)p(D|M_0)+p(M_1)p(D|M_1)} & \text{(5.44)}\ & = \frac{p(M_0)\int_{\Theta_0} p(D|\theta)p(\theta|M_0)d\theta}{p(M_0)\int _{\Theta_0}p(D|\theta)p(\theta|M_0)d\theta + p(M_1)\int _{\Theta_1}p(D|\theta)p(\theta|M_1)d\theta} & \text{(5.45)}\ \end{aligned} ?

? \begin{aligned} p(M_0|D)& = \frac{p(M_0)c_0 \int_{\Theta_0} p(D|\theta)d\theta}{ p(M_0)c_0 \int_{\Theta_0} p(D|\theta)d\theta + p(M_1)c_1 \int_{\Theta_1} p(D|\theta)d\theta } & \text{(5.46)}\ & = \frac{p(M_0)c_0 l_0}{p(M_0)c_0 l_0+p(M_1)c_1 l_1} & \text{(5.47)}\ \end{aligned} ?

$p(M_0|D)= \frac{c_0 l_0}{c_0 l_0+c_1 l_1} =\frac{l_0}{l_0+(c_1/c_0)l_1}$(5.48)

## 5.4 先验

### 5.4.1 无信息先验

$\lim_{c\rightarrow 0}Beta(c,c)=Beta(0,0)$(5.49)

### 5.4.2 杰弗里斯先验论(Jeffreys priors)*

$p_\theta(\theta)= p_\phi(\phi)|\frac{d\phi}{d\theta}|$(5.50)

$p_\phi(\phi)\propto (I(\phi))^{\frac12}$(5.51)

$I(\phi) *= -E[(\frac{d\log p(X|\phi)}{d\phi} )^2]$(5.52)

$\frac{d\log p(x|\theta)}{d\theta}=\frac{d\log p(x|\phi)}{d\phi}\frac{d\phi}{d\theta}$(5.53)

? \begin{aligned} I(\theta)&= -E[(\frac{d\log p(X|\theta)}{d\theta} )^2]=I(\phi)(\frac{d\phi}{d\theta})^2 &\text{(5.54)}\ I(\theta)^{\frac{1}{2}} &= I(\phi)^{\frac12}|\frac{d\phi}{d\theta}| &\text{(5.55)}\ \end{aligned} ?

$p_\theta(\theta)=p_\phi(\phi)|\frac{d\phi}{d\theta}|\propto (I(\phi))^{\frac12}|\frac{d\phi}{d\theta}| =I(\theta)^{\frac12}$(5.56)

#### 5.4.2.1 样例:伯努利模型和多重伯努利模型的杰弗里斯先验论(Jeffreys priors)

$\log p(X|\theta) =X|log \theta(1-X)\log(1-\theta)$(5.57)

$s(\theta)*=\frac{d}{d\theta}\log p(X|\theta)=\frac{X}{\theta}-\frac{1-X}{(1-\theta)^2}$(5.58)

$J(\theta)=-\frac{d^2}{d\theta^2}\log p(X|\theta)=-s'(\theta|X)=\frac{X}{\theta^2}+\frac{1-X}{(1-\theta)^2}$(5.59)

$I(\theta)=E[J(\theta|X)|X\sim \theta]=\frac{\theta}{\theta^2}+\frac{1-\theta}{(1-\theta)^2}=\frac{1}{\theta(1-\theta)}$(5.60)

$p(\theta)\propto \theta^{-\frac12}(1-\theta)^{-\frac12}=\frac{1}{\sqrt{\theta(1-\theta)}}\propto Beta(\frac12,\frac12)$(5.61)

$p(\theta)\propto Dir(\frac12,...,\frac12)$(5.62)

#### 5.4.2.2 样例:局部和缩放参数的杰弗里斯先验论(Jeffreys priors)

$\int^{B-c}_{A-c}p(\mu)d\mu=(A-c)-(B-c)=(A-B)=\int^B_Ap(\mu)d\mu$(5.63)

$p(s)\propto 1/s$(5.64)

? \begin{aligned} \int ^{B/c}{A/c}&= [\log s]^{B/c}{A/c} = \log(B/c)-\log(A/c) &\text{(5.65)}\ &= \log(B)-\log(A)=\int^B_Ap(s)ds &\text{(5.66)}\ \end{aligned} ?

### 5.4.4 共轭先验的混合

$p(\theta)=0.5Beta(\theta|20,20)+0.5Beta(\theta|30,10)$(5.67)

$p(\theta)=\sum_k p(z=k)p(\theta|z=k)$(5.68)

$p(\theta|D)=\sum_k p(z=k)p(\theta|D,z=k)$(5.69)

$p(Z=k|D)=\frac{p(Z=k)p(D|Z=k)}{\sum_{k'}p(Z=k')p(D|Z=k')}$(5.70)

#### 5.4.4.1 样例

$p(\theta|D)=p(Z=1|D)Beta(\theta|a_1+N_1,b_1+N_0)+p(Z=2|D)Beta(\theta|a_2+N_1,b_2+N_0)$(5.72)

$p(\theta|D)=0.346Beta(\theta|40,30)+0.654Beta(\theta|50,20)$(5.73)

#### 5.4.4.2 应用:在DNA和蛋白质序列中找到保留区域(conserved regions)

$p(N_t|Z_t)=\int p(N_t|\theta_t)p(\theta_t|Z_t)d\theta_t$(5.74)

$p(\theta|Z_t=1)=\frac14Dir(\theta|(10,1,1,1))+...+\frac14Dor(\theta|(1,1,1,10))$(5.75)

## 5.5 分层贝叶斯(Hierarchical Bayes)

### 5.5.1 样例:与癌症患病率相关的模型

$p(D,\theta,\eta|N)=p(\eta)\prod^N_{i=1}Bin(x_i|N_i,\theta_i)Beta(\theta_i|\eta)$(5.77)

## 5.6 经验贝叶斯(Empirical Bayes)

$\hat\eta =\arg\max p(D|\eta)=\arg\max[\int p(D|\theta)p(\theta|\eta)d\theta]$(5.79)