【概率论基础进阶】参数估计-估计量求法和区间估计

49 阅读4分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

矩估计法

用样本矩估计相应的总体矩,用样本矩的函数估计总体矩相应的函数,然后求出要估计的参数,称这种估计法为矩估计法

 

步骤

XX为连续型随机变量,其概率密度为f(x;θ1,θ2,,θk)f(x;\theta_{1},\theta_{2},\cdots,\theta_{k}),或XX为离散型随机变量,其分布律为P{X=x}=p(x;θ1,θ2,,θk)P \left\{X=x\right\}=p(x;\theta_{1},\theta_{2},\cdots,\theta_{k}),其中θ1,θ2,,θk\theta_{1},\theta_{2},\cdots,\theta_{k}为待估计参数,X1,X2,,XnX_{1},X_{2},\cdots,X_{n}是来自XX的样本。假设总体XX的前kk阶矩

μl=E(Xl)=+xlf(x;θ1,θ2,,θk)dx(X连续型)μl=E(Xl)=xRXxlp(x;θ1,θ2,,θk)(X离散型) \begin{aligned} \mu_{l}=E(X^{l})&=\int_{-\infty}^{+\infty}x^{l}f(x;\theta_{1},\theta_{2},\cdots,\theta_{k})dx \quad (X连续型)\\ \mu_{l}=E(X^{l})&=\sum\limits_{x \in R_{X}}^{}x^{l}p(x;\theta_{1},\theta_{2},\cdots,\theta_{k})\quad (X离散型) \end{aligned}

(其中RXR_{X}XX可能取值的范围)存在。一般来说,它们是θ1,θ2,,θk\theta_{1},\theta_{2},\cdots,\theta_{k}的函数。基于样本矩

Al=1ni=1nXil A_{l}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{l}

依概率收敛于相应的总体距μl(l=1,2,,k)\mu_{l}(l=1,2,\cdots ,k),样本矩的连续函数依概率收敛于相应的总体矩的连续函数,我们就用样本矩作为相应的总体矩的估计量,而以样本矩的连续函数作为相应的总体矩的连续函数的估计量。这种估计方法称为矩估计法。矩估计法的具体做法如下:设

{μ1=μ1(θ1,θ2,,θk)μ2=μ2(θ1,θ2,,θk)μk=μk(θ1,θ2,,θk) \left\{\begin{aligned}&\mu_{1}=\mu_{1}(\theta_{1},\theta_{2},\cdots,\theta_{k})\\ &\mu_{2}=\mu_{2}(\theta_{1},\theta_{2},\cdots,\theta_{k})\\ &\vdots \\&\mu_{k}=\mu_{k}(\theta_{1},\theta_{2},\cdots,\theta_{k})\end{aligned}\right.

这是一个包含kk个未知参数θ1,θ2,,θk\theta_{1},\theta_{2},\cdots,\theta_{k}的联立方程组。一般来说,可以从中解出θ1,θ2,,θk\theta_{1},\theta_{2},\cdots,\theta_{k},得到

{θ1=θ1(μ1,μ2,,μk)θ2=θ2(μ1,μ2,,μk)θk=θk(μ1,μ2,,μk) \left\{\begin{aligned}&\theta_{1}=\theta_{1}(\mu_{1},\mu_{2},\cdots ,\mu_{k})\\&\theta_{2}=\theta_{2}(\mu_{1},\mu_{2},\cdots ,\mu_{k})\\ \vdots \\&\theta_{k}=\theta_{k}(\mu_{1},\mu_{2},\cdots ,\mu_{k})\end{aligned}\right.

AiA_{i}分别代替上式中的μi,i=1,2,,k\mu_{i},i=1,2,\cdots ,k,就以

θi^=θi(A1,A2,,Ak),i=1,2,,k \hat{\theta_{i}}=\theta_{i}(A_{1},A_{2},\cdots ,A_{k}),i=1,2,\cdots ,k

分别作为θi,i=1,2,,k\theta_{i},i=1,2,\cdots ,k的估计量,这种估计量称为矩估计量。矩估计量的观察值称为矩估计值

 

例1:设总体XX[a,b][a,b]上服从均匀分布,a,ba,b未知,X1,X2,,XnX_{1},X_{2},\cdots,X_{n}是来自XX的样本,试求a,ba,b的矩估计量

 

μ1=E(X)=a+b2μ2=E(X2)=D(X)+[E(X)]2=(ba)212+(a+b)24{a+b=2μ1ba=12(μ2μ12){a=μ13(μ2μ12)b=μ1+3(μ2μ12) \begin{aligned} \mu_{1}&=E(X)=\frac{a+b}{2}\\ \mu_{2}&=E(X^{2})=D(X)+[E(X)]^{2}=\frac{(b-a)^{2}}{12}+ \frac{(a+b)^{2}}{4}\\ &\Rightarrow \left\{\begin{aligned}&a+b=2\mu_{1}\\&b-a=\sqrt{12(\mu_{2}-\mu_{1}^{2})}\end{aligned}\right.\\ &\Rightarrow \left\{\begin{aligned}&a=\mu_{1}-\sqrt{3(\mu_{2}-\mu_{1}^{2})}\\&b=\mu_{1}+\sqrt{3(\mu_{2}-\mu_{1}^{2})}\end{aligned}\right. \end{aligned}

注意到1ni=1nXi2Xˉ2=1ni=1n(XiXˉ)2\begin{aligned} \frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{2}-\bar{X}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}\end{aligned},分别以A1,A2A_{1},A_{2}代替μ1,μ2\mu_{1},\mu_{2},得到a,ba,b的矩估计量分别为

a^=A13(A2A12)=Xˉ3ni=1n(XiXˉ)2b^=A1+3(A2A12)=Xˉ+3ni=1n(XiXˉ)2 \begin{aligned} \hat{a}&=A_{1}-\sqrt{3(A_{2}-A_{1}^{2})}=\bar{X}-\sqrt{\frac{3}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}}\\ \hat{b}&=A_{1}+\sqrt{3(A_{2}-A_{1}^{2})}=\bar{X}+\sqrt{\frac{3}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}} \end{aligned}

 

例2:设总体XX的均值μ\mu及方差σ2\sigma^{2}都存在,且有σ2>0\sigma^{2}>0。但μ,σ2\mu,\sigma^{2}均为未知,又设X1,X2,,XnX_{1},X_{2},\cdots,X_{n}是来自XX得样本。试求μ,σ2\mu,\sigma^{2}的矩估计量

 

μ1=E(X)=μμ2=E(X2)=D(X)+[E(X)]2=σ2+μ{μ=μ1σ2=μ2μ12 \begin{aligned} \mu_{1}&=E(X)=\mu\\ \mu_{2}&=E(X^{2})=D(X)+[E(X)]^{2}=\sigma^{2}+\mu\\ &\Rightarrow \left\{\begin{aligned}&\mu=\mu_{1}\\&\sigma^{2}=\mu_{2}-\mu_{1}^{2}\end{aligned}\right. \end{aligned}

分别以A1,A2A_{1},A_{2}代替μ1,μ2\mu_{1},\mu_{2},得μ\muσ2\sigma^{2}的矩估计量分别为

μ^=A1=Xˉσ2^=A2A12=1ni=1nXi2Xˉ2=1ni=1n(XiXˉ)2 \begin{aligned} \hat{\mu}&=A_{1}=\bar{X}\\ \hat{\sigma^{2}}&=A_{2}-A_{1}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{2}-\bar{X}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2} \end{aligned}

 

所得结果表名,总体均值与方差的矩估计量的表达式不因不同的总体分布而异

 

最大似然估计

若总体XX属离散型,其分布律P{X=x}=p{x;θ},θΘP \left\{X=x\right\}=p \left\{x;\theta\right\},\theta \in \Theta的形式已知,θ\theta为待估参数,Θ\Thetaθ\theta可能取值的范围。设X1,X2,,XnX_{1},X_{2},\cdots,X_{n}是来自XX的样本,则X1,X2,,XnX_{1},X_{2},\cdots,X_{n}的联合分布律为

i=1np(xi;θ) \prod\limits_{i=1}^{n}p(x_{i};\theta)

又设x1,x2,,xnx_{1},x_{2},\cdots,x_{n}是相应于样本X1,X2,,XnX_{1},X_{2},\cdots,X_{n}的一个样本值。易知样本X1,X2,,XnX_{1},X_{2},\cdots,X_{n}取到观察值x1,x2,,xnx_{1},x_{2},\cdots,x_{n}的概率,亦即事件{X1=x1,X2=x2,,Xn,xn}\left\{X_{1}=x_{1},X_{2}=x_{2},\cdots ,X_{n},x_{n}\right\}发生的概率为

L(θ)=L(x1,x2,,xn;θ)=i=1np(xi;θ),θΘ L(\theta)=L(x_{1},x_{2},\cdots,x_{n};\theta)=\prod\limits_{i=1}^{n}p(x_{i};\theta),\theta \in \Theta

这一概率随θ\theta的取值而变化,它是θ\theta的函数,L(θ)L(\theta)称为样本的似然函数(注意,这里x1,x2,,xnx_{1},x_{2},\cdots,x_{n}是已知的样本值,它们都是常数

θ\theta取值的可能范围Θ\Theta内挑选使似然函数L(x1,x2,,xn;θ)L(x_{1},x_{2},\cdots,x_{n};\theta)达到最大的参数值θ^\hat{\theta},作为参数θ\theta的估计值。即使θ^\hat{\theta}使

L(x1,x2,,xn;θ^)=maxθΘL(x1,x2,,xn;θ) L(x_{1},x_{2},\cdots,x_{n};\hat{\theta})=\max\limits_{\theta \in \Theta}L(x_{1},x_{2},\cdots,x_{n};\theta)

这样得到的θ^\hat{\theta}与样本值x1,x2,,xnx_{1},x_{2},\cdots,x_{n}有关,常记为θ^(x1,x2,,xn)\hat{\theta}(x_{1},x_{2},\cdots,x_{n}),称为参数θ\theta的最大似然估计值,而相应的统计量θ^(X1,X2,,Xn)\hat{\theta}(X_{1},X_{2},\cdots,X_{n})称为参数θ\theta的最大似然估计量

 

若总体XX属连续型,其概率密度f(x;θ),θΘf(x;\theta),\theta \in \Theta的形式已知,θ\theta为待估参数,Θ\Thetaθ\theta可能取值的范围,设X1,X2,,XnX_{1},X_{2},\cdots,X_{n}是来自XX得样本,则X1,X2,,XnX_{1},X_{2},\cdots,X_{n}的联合密度为

i=1nf(xi,θ) \prod\limits_{i=1}^{n}f(x_{i},\theta)

x1,x2,,xnx_{1},x_{2},\cdots,x_{n}是相应于样本X1,X2,,XnX_{1},X_{2},\cdots,X_{n}的一个样本值,则随机点(X1,X2,,Xn)(X_{1},X_{2},\cdots,X_{n})落在点(x1,x2,,xn)(x_{1},x_{2},\cdots,x_{n})的邻域(边长分别为dx1,dx2,,dxndx_{1},d x_{2} ,\cdots ,d x_{n}nn维立方体)内的概率近似地为

i=1nf(xi;θ)dx \prod\limits_{i=1}^{n}f(x_{i};\theta)dx

其值随θ\theta的取值而变化,与离散型的情况一样,我们取θ\theta的估计量θ^\hat{\theta}使概率取到最大值,但因子i=1ndxi\prod\limits_{i=1}^{n}dx_{i}不随θ\theta而变,故只需考虑函数

L(θ)=L(x1,x2,,xn;θ)=i=1nf(xi;θ) L(\theta)=L(x_{1},x_{2},\cdots,x_{n} ;\theta)=\prod\limits_{i=1}^{n}f(x_{i};\theta)

的最大值,这里L(θ)L(\theta)称为样本的似然函数,若

L(x1,x2,,xn;θ)=maxθΘL(x1,x2,,xn;θ) L(x_{1},x_{2},\cdots,x_{n};\theta)=\max\limits_{\theta \in \Theta}L(x_{1},x_{2},\cdots,x_{n};\theta)

则称θ^(x1,x2,,xn)\hat{\theta}(x_{1},x_{2},\cdots,x_{n})θ\theta的最大似然估计值,称θ^(X1,X2,,Xn)\hat{\theta}(X_{1},X_{2},\cdots,X_{n})θ\theta的最大似然估计量

 

这样,确定最大似然估计量的问题就归结为微分学中秋最大值的问题了,

在很多情形下,p(x;θ)p(x;\theta)f(x;θ)f(x;\theta)关于θ\theta可微,这时θ^\hat{\theta}常可从方程

ddθL(θ)=0 \frac{d}{d \theta}L(\theta)=0

解得。又因L(θ)L(\theta)L(lnθ)L(\ln \theta)在同一θ\theta处取到极值,因此,θ\theta的最大似然估计θ\theta也可以从方程

ddθlnL(θ)=0 \frac{d}{d \theta}\ln L(\theta)=0

求得,称为对数似然方程

 

例3:设XN(μ,σ2)X \sim N(\mu,\sigma^{2}),其中μ,σ2(σ>0)\mu,\sigma^{2}(\sigma>0)均为未知参数,从总体取得样本为X1,X2,,XnX_{1},X_{2},\cdots,X_{n},试求

  • μ\muσ2\sigma^{2}的矩估计

 

E(X)=μE(X2)=D(X)+[E(X)]2=σ2+μ2{μ=Xˉσ2+μ2=1ni=1nXi2{μ^=Xˉσ^=1ni=1nXi2Xˉ2=1ni=1n(XiXˉ)2 \begin{aligned} E(X)&=\mu\\ E(X^{2})&=D(X)+[E(X)]^{2}=\sigma^{2}+\mu^{2}\\ & \Rightarrow \left\{\begin{aligned}&\mu=\bar{X}\\&\sigma^{2}+\mu^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{2}\end{aligned}\right.\\ & \Rightarrow \left\{\begin{aligned}&\hat{\mu}=\bar{X}\\&\hat{\sigma}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{2}-\bar{X}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}\end{aligned}\right. \end{aligned}

 

二阶矩可以用中心矩直接得到σ^=1ni=1nXi2Xˉ2=1ni=1n(XiXˉ)2\begin{aligned} \hat{\sigma}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}^{2}-\bar{X}^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}\end{aligned}

 

  • μ\muσ2\sigma^{2}的最大似然估计

 

由于XX的密度为12πσe(xμ)22σ2,<x<+\begin{aligned} \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}},-\infty<x<+\infty\end{aligned},所以

L(μ,σ2)=(12πσ)ne12σ2i=1n(Xiμ)2lnL(μ,σ2)=n2ln(2π)n2ln(σ2)12σ2i=1n(Xiμ)2{lnLμ=1σ2i=1n(Xiμ)=0lnLσ2=n21σ2+12(σ2)2i=1n(Xiμ)2=0{μ^Xˉσ2^=1ni=1n(XiXˉ)2 \begin{aligned} L(\mu,\sigma^{2})&=\left(\frac{1}{\sqrt{2\pi}\sigma}\right)^{n}e^{-\frac{1}{2\sigma^{2}}\sum\limits_{i=1}^{n}(X_{i}-\mu)^{2}}\\ \ln L(\mu,\sigma^{2})&=- \frac{n}{2}\ln (2\pi)- \frac{n}{2}\ln (\sigma^{2})- \frac{1}{2\sigma^{2}}\sum\limits_{i=1}^{n}(X_{i}-\mu)^{2}\\ &\Rightarrow \left\{\begin{aligned}&\frac{\partial \ln L}{\partial \mu}=\frac{1}{\sigma^{2}}\sum\limits_{i=1}^{n}(X_{i}-\mu)=0\\&\frac{\partial \ln L}{\partial \sigma^{2}}=- \frac{n}{2} \frac{1}{\sigma^{2}}+\frac{1}{2(\sigma^{2})^{2}}\sum\limits_{i=1}^{n}(X_{i}-\mu)^{2}=0\end{aligned}\right.\\ &\Rightarrow \left\{\begin{aligned}&\hat{\mu}-\bar{X}\\&\hat{\sigma^{2}}=\frac{1}{n}\sum\limits_{i=1}^{n}(X_{i}-\bar{X})^{2}\end{aligned}\right. \end{aligned}

 

例4:设总体XU[θ,θ],X1,X2,,XnX \sim U[-\theta,\theta],X_{1},X_{2},\cdots,X_{n}是来自总体XX的简单随机样本,试求参数θ\theta的最大似然估计

 

似然函数为

L(θ)=i=1nf(xi)={1(2θ)nθx1,x2,,xnθ0其他 L(\theta)=\prod\limits_{i=1}^{n}f(x_{i})=\left\{\begin{aligned}& \frac{1}{(2\theta)^{n}}&-\theta \leq x_{1},x_{2},\cdots,x_{n}\leq \theta\\&0&其他\end{aligned}\right.

 

这里求导只能得到最小值

 

显然,θ\theta越小,1(2θ)n\begin{aligned} \frac{1}{(2\theta)^{n}}\end{aligned}越大,但θ\theta必须满足θx1,x2,,xnθ-\theta \leq x_{1},x_{2},\cdots,x_{n}\leq \theta,也就是必有

{θmax(x1,x2,,xn)θmin(x1,x2,,xn)θmax(x1,x2,,xn) \begin{aligned} &\left\{\begin{aligned}&\theta \geq \max (x_{1},x_{2},\cdots,x_{n})\\&-\theta \leq \min (x_{1},x_{2},\cdots,x_{n})\end{aligned}\right.\\ & \Rightarrow \theta \geq \max (|x_{1}|,|x_{2}|,\cdots ,|x_{n}|) \end{aligned}

即有θ\theta的最大似然估计θ^=max(X1,X2,,Xn)\hat{\theta}=\max (|X_{1}|,|X_{2}|,\cdots ,|X_{n}|)