【概率论基础进阶】随机变量的数字特征-矩、协方差和相关系数

57 阅读3分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

定义:

XX是随机变量,如果

E(Xk),k=1,2, E(X^{k}),k=1,2,\cdots

存在,则称之为XXkk阶原点矩

 

XX是随机变量,如果

E{[XE(X)]k},k=1,2,3, E \left\{[X-E(X)]^{k}\right\},k=1,2,3,\cdots

存在,则称之为XXkk阶中心矩

 

XXYY是两个随机变量,如果

E(XkYl),k,l=1,2, E(X^{k}Y^{l}),k,l=1,2,\cdots

存在,则称之为XXYYk+lk+l阶混合矩

 

XXYY是两个随机变量,如果

E{[XE(X)]k[YE(Y)]l},k,l=1,2, E \left\{[X-E(X)]^{k}[Y-E(Y)]^{l}\right\},k,l=1,2,\cdots

存在,则称之为XXYYk+lk+l阶混合中心矩

 

协方差

定义:对于随机变量XXYY,如果E{[XE(X)][YE(Y)]}E \left\{[X-E(X)][Y-E(Y)]\right\}存在,则称之为XXYY的协方差,记作cov(X,Y)\text{cov}(X,Y),即

cov(X,Y)=E{[XE(X)][YE(Y)]} \text{cov}(X,Y)=E \left\{[X-E(X)][Y-E(Y)]\right\}

 

计算公式

  • cov(X,Y)=E(XY)E(X)E(Y)\text{cov}(X,Y)=E(XY)-E(X)E(Y)

  • D(X±Y)=D(X)+D(Y)±cov(X,Y)D(X \pm Y)=D(X)+D(Y)\pm \text{cov}(X,Y)

 

性质

  • cov(X,Y)=cov(X,Y)\text{cov}(X,Y)=\text{cov}(X,Y)

  • cov(aX,bY)=abcov(X,Y)\text{cov}(aX,bY)=ab \text{cov}(X,Y),其中a,ba,b是常数

  • cov(X1+X2,Y)=cov(X1,Y)+cov(X2,Y)\text{cov}(X_{1}+X_{2},Y)=\text{cov}(X_{1},Y)+\text{cov}(X_{2},Y)

 

例1:设随机变量X1,X2,,Xn(n>1)X_{1},X_{2},\cdots ,X_{n}(n>1)相互独立,均服从正态分布N(0,σ2)N(0,\sigma^{2}),则cov(X1,1ni=1nXi)=()\text{cov}(X_{1}, \frac{1}{n}\sum\limits_{i=1 }^{n}X_{i})=()

 

注意此处1ni=1nXiE(X)\begin{aligned} \frac{1}{n}\sum\limits_{i=1}^{n}X_{i}\ne E(X)\end{aligned}

 

cov(X1,1ni=1nXi)=E(X1EX1)(1ni=1nXi1ni=1nEXi)=E[X1(1ni=1nXi)]=1nE(X1i=1nXi)=1nE(X12+i=2nX1Xi)=1n(σ2+i=2n0)=σ2n \begin{aligned} \text{cov}(X_{1},\frac{1}{n}\sum\limits_{i=1}^{n}X_{i})&=E(X_{1}-EX_{1})(\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}- \frac{1}{n}\sum\limits_{i=1}^{n}EX_{i})\\ &=E\left[X_{1} \left(\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}\right) \right]\\ &=\frac{1}{n}E(X_{1}\sum\limits_{i=1}^{n}X_{i})\\ &=\frac{1}{n}E(X_{1}^{2}+\sum\limits_{i=2}^{n}X_{1}X_{i})\\ &=\frac{1}{n}(\sigma^{2}+\sum\limits_{i=2}^{n}0)\\ &=\frac{\sigma^{2}}{n} \end{aligned}

 

例2:箱中装有66个球,其中红、白、黑球的个数分别为1,2,31,2,3个,现从箱中随机地取出22个球,记XX为取出红球的个数,YY为取出的白球个数,求cov(X,Y)\text{cov}(X,Y)

 

XX \ YY00          11           22                         
00      15\frac{1}{5}25\frac{2}{5} 115\frac{1}{15}23\frac{2}{3}
11      15\frac{1}{5}215\frac{2}{15}00           13\frac{1}{3}
          25\frac{2}{5}815\frac{8}{15}115\frac{1}{15}                             

 

cov(X,Y)=E(XY)E(X)E(Y)E(X)=0×23+1×13=13E(Y)=0×25+1×815+2×115=23E(XY)=0×(15+25+115+15)+1×1×215+1×2×0=215cov(X,Y)=21513×23=445 \begin{aligned} \text{cov}(X,Y)&=E(XY)-E(X)\cdot E(Y)\\ E(X)&=0 \times \frac{2}{3}+1\times \frac{1}{3}=\frac{1}{3}\\ E(Y)&=0\times \frac{2}{5}+1\times \frac{8}{15}+2\times \frac{1}{15}=\frac{2}{3}\\ E(XY)&=0\times \left(\frac{1}{5}+ \frac{2}{5}+ \frac{1}{15}+ \frac{1}{5}\right)+ 1 \times 1\times \frac{2}{15}+1\times 2\times 0=\frac{2}{15}\\ \text{cov}(X,Y)&=\frac{2}{15}- \frac{1}{3}\times \frac{2}{3}=- \frac{4}{45} \end{aligned}

 

相关系数

定义:随机变量XXYY,如果D(X)D(Y)0D(X)D(Y)\ne 0,则称cov(X,Y)D(X)D(Y)\begin{aligned} \frac{\text{cov}(X,Y)}{\sqrt{D(X)}\sqrt{D(Y)}}\end{aligned}XXYY的相关系数,记为ρXY\rho_{XY},即

ρXY=cov(X,Y)D(X)D(Y) \rho_{XY}=\frac{\text{cov}(X,Y)}{\sqrt{D(X)}\sqrt{D(Y)}}

如果D(X)D(Y)=0D(X)D(Y)=0,则ρXY=0\rho_{XY}=0

如果随机变量XXYY的相关系数ρXY=0\rho_{XY}=0,则称XXYY不相关

 

性质

  • ρXY1| \rho_{XY}| \leq 1

  • ρXY=1| \rho_{XY}|=1的充分必要条件是存在常数aabb,其中a0a \ne 0,使得

         P{Y=aX+b}=1P \left\{Y=aX+b\right\}=1

 

个人理解,相关系数的意义,相关系数可写作

ρ=E(XEX)(YE(Y))DXDY=E(XEX)DXE(YEY)DY\rho=\frac{E(X-EX)(Y-E(Y))}{\sqrt{DX}\sqrt{DY}}=\frac{E(X-EX)}{\sqrt{DX}}\cdot \frac{E(Y-EY)}{\sqrt{DY}}

即两个变量标准化后的协方差。两变量的协方差如果很大,无法直接得到其是由于E(X)E(X)E(Y)E(Y)大导致,还是二者差异大导致,因此进行标准化

 

例3:随机试验EE有三种两两不相容的结果A1,A2,A3A_{1},A_{2},A_{3},且三种结果发生的概率均为13\frac{1}{3},将试验EE独立重复做22次,XX表示22次试验中结果A1A_{1}发生的次数,YY表示22次试验中结果A2A_{2}发生的次数,则XXYY的相关系数为()

 

ZZ表示22次试验中结果A3A_{3}发生的次数,试验是独立重复的,把A3A_{3}发生看成是试验成功,且P(A3)=13P(A_{3})=\frac{1}{3},所以,随机变量ZZ必服从二项分布B(2,13)B(2, \frac{1}{3}),同理,XXYY也都服从B(2,13)B(2, \frac{1}{3}),因此DX=DYDX=DY

显然有

X+Y+Z=2Y=2XZ X+Y+Z=2 \Rightarrow Y=2-X-Z

因此

cov(X,Y)=cov(X,2XZ)=cov(X,2)cov(X,X)cov(X,Z)=0DXcov(X,Z)由对称性cov(X,Y)=cov(X,Z)=DXcov(X,Y) \begin{aligned} \text{cov}(X,Y)&=\text{cov}(X,2-X-Z)\\ &=\text{cov}(X,2)-\text{cov}(X,X)-\text{cov}(X,Z)\\ &=0-DX-\text{cov}(X,Z)\\ &由对称性\text{cov}(X,Y)=\text{cov}(X,Z)\\ &=-DX-\text{cov}(X,Y) \end{aligned}

cov(X,Y)=DX2 \text{cov}(X,Y)=- \frac{DX}{2}

XXYY的相关系数

ρXY=cov(X,Y)DXDY=DX2DX=12 \rho_{XY}=\frac{\text{cov}(X,Y)}{\sqrt{DX}\sqrt{DY}}=\frac{-\frac{DX}{2}}{DX}=- \frac{1}{2}

 

独立与不相关

  1. 如果随机变量XXYY相互独立,则XXYY必不相关;反之,XXYY不相关时,XXYY不一定相互独立

  2. 对二维正态随机变量(X,Y)(X,Y)XXYY相互独立的充分必要条件是ρ=0\rho=0

  3. 对二维正态随机变量(X,Y)(X,Y)XXYY相互独立与XXYY不相关是等价的

 

例4:设随机变量XXYY的概率分布分别为

 

XX00          11          
PP13\frac{1}{3}23\frac{2}{3}

 

YY1-1         00          11          
PP13\frac{1}{3}13\frac{1}{3}13\frac{1}{3}

 

P{X2=Y2}=1P \left\{X^{2}=Y^{2}\right\}=1

  • 求二维随机变量(X,Y)(X,Y)的概率分布

 

P{X2=Y2}=1P \left\{X^{2}=Y^{2}\right\}=1,得P{X2Y2}=0P \left\{X^{2}\ne Y^{2}\right\}=0,因此有

 

XX\ PP     1-1         00          11          PiP_{i \cdot }
00           00                        00          13\frac{1}{3} 
11                        00                        23\frac{2}{3} 
PjP_{\cdot j}13\frac{1}{3}13\frac{1}{3}13\frac{1}{3}               

 

可得

 

XX\ PP     1-1         00          11          PiP_{i \cdot }
00          00          13\frac{1}{3}00          13\frac{1}{3} 
11          13\frac{1}{3}00          13\frac{1}{3}23\frac{2}{3} 
PjP_{\cdot j}13\frac{1}{3}13\frac{1}{3}13\frac{1}{3}               

 

  • XXYY是否相互独立,是否不相关

 

显然存在pijpi pjp_{ij}\ne p_{i \cdot  }p_{\cdot j},因此XXYY不独立

又有

ρXY=cov(X,Y)DXDY=EXYEXEYDXDY=023×0DXDY=0 \begin{aligned} \rho_{XY}=\frac{\text{cov}(X,Y)}{\sqrt{DX}\sqrt{DY}}=\frac{EXY-EX \cdot EY}{\sqrt{DX}\sqrt{DY}}=\frac{0- \frac{2}{3}\times 0}{\sqrt{DX }\sqrt{DY }}=0 \end{aligned}

因此XXYY不相关,不独立

 

例5:已知随机变量(X,Y)(X,Y)服从二维正态分布N(μ1,μ2;σ12,σ22;0)N(\mu_{1},\mu_{2};\sigma_{1}^{2},\sigma_{2}^{2};0),记Z1=X+YZ_{1}=X+YZ2=XYZ_{2}=X-Y

  • (Z1,Z2)(Z_{1},Z_{2})的分布

 

由于

11110 \begin{vmatrix}1 & 1 \\ 1 & -1\end{vmatrix}\ne 0

因此(Z1,Z2)=(X+Y,XY)(Z_{1},Z_{2})=(X+Y,X-Y)服从二维正态

D(Z1)=D(X+Y)=DX+DY=σ12+σ22D(Z2)=D(XY)=DX+DY=σ12+σ22cov(Z1,Z2)=cov(X+Y,XY)=cov(X,X)cov(X,Y)+cov(Y,X)cov(Y,Y)=σ12σ22ρZ1,Z2=cov(Z1,Z2)DZ1DZ2=σ12σ22σ12+σ22 \begin{aligned} D(Z_{1})&=D(X+Y)=DX+DY=\sigma_{1}^{2}+\sigma_{2}^{2}\\ D(Z_{2})&=D(X-Y)=DX+DY=\sigma_{1}^{2}+\sigma_{2}^{2}\\ \text{cov}(Z_{1},Z_{2})&=\text{cov}(X+Y,X-Y)\\ &=\text{cov}(X,X)-\text{cov}(X,Y)+\text{cov}(Y,X)-\text{cov}(Y,Y)\\ &=\sigma_{1}^{2}-\sigma_{2}^{2}\\ \rho_{Z_{1},Z_{2}}&=\frac{\text{cov}(Z_{1},Z_{2})}{\sqrt{DZ_{1}}\sqrt{DZ_{2}}}=\frac{\sigma_{1}^{2}-\sigma_{2}^{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}} \end{aligned}

因此(Z1,Z2)N(μ1+μ2,μ1μ2;σ12+σ22,σ12σ22;σ12σ22σ12+σ22)\begin{aligned} (Z_{1},Z_{2})\sim N(\mu_{1}+\mu_{2},\mu_{1}-\mu_{2};\sigma_{1}^{2}+\sigma_{2}^{2},\sigma_{1}^{2}-\sigma_{2}^{2};\frac{\sigma_{1}^{2}-\sigma_{2}^{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}})\end{aligned}

 

  • Z1Z_{1}Z2Z_{2}是否相互独立

 

σ1=σ2\sigma_{1}=\sigma_{2}时,ρZ1Z2=0\begin{aligned} \rho_{Z_{1}Z_{2}}=0 \Rightarrow \end{aligned} Z1,Z2Z_{1},Z_{2}不相关,相互独立

σ1σ2\sigma_{1}\ne \sigma_{2}时,ρZ1Z20\begin{aligned} \rho_{Z_{1}Z_{2}}\ne 0 \Rightarrow \end{aligned} Z1,Z2Z_{1},Z_{2}不相互独立

 

(X,Y)(X,Y)是正态时,不相关与独立等价

(X,Y)(X,Y)正态时,XXYY必正态,反之不一定

XXYY均正态且相互独立,则(X,Y)(X,Y)必正态

(X,Y)(X,Y)正态的充要条件为abcd0\begin{vmatrix}a & b\\c & d\end{vmatrix}\ne 0时,(aX+bY,cX+dY)(aX+bY,cX+dY)为正态

 

例6:设随机变量(X,Y)(X,Y)服从二维正态分布N(0,0;1,4;12)\begin{aligned} N\left(0,0;1,4; - \frac{1}{2}\right)\end{aligned},证明33(X+Y)\begin{aligned} \frac{\sqrt{3}}{3}(X+Y)\end{aligned}是标准正态分布,且与XX独立的是

 

显然XN(0,1),Y(0,4)X \sim N(0,1),Y \sim (0,4)

cov(X,Y)=ρDXDY=1D(X+Y)=DX+DY+2cov(X,Y)=3 \begin{aligned} \text{cov}(X,Y)&=\rho \sqrt{DX}\sqrt{DY}=-1\\ D(X+Y)&=DX+DY+2\text{cov}(X,Y)=3 \end{aligned}

 

注意这里D(X+Y)DX+DYD(X+Y)\ne DX+DY,由于没有X,YX,Y相互独立的条件

 

E[33(X+Y)]=33(EX+EY)=0D[33(X+Y)]=332D(X+Y)=1cov(33(X+Y),X)=33[cov(X,X)+cov(Y,X)]=0ρ33(X+Y),X=0 \begin{aligned} E\left[ \frac{\sqrt{3}}{3}(X+Y)\right]&=\frac{\sqrt{3}}{3}(EX+EY)=0\\ D\left[ \frac{\sqrt{3}}{3}(X+Y)\right]&=\frac{3}{3^{2}}D(X+Y)=1\\ \text{cov}\left(\frac{\sqrt{3}}{3}(X+Y),X\right)&=\frac{\sqrt{3}}{3}[\text{cov}(X,X)+\text{cov}(Y,X)]=0\\ \rho_{\frac{\sqrt{3}}{3}(X+Y),X}&=0 \end{aligned}

由于

3333100 \begin{vmatrix} \frac{\sqrt{3}}{3} & \frac{\sqrt{3}}{3} \\ 1 & 0\end{vmatrix}\ne 0

因此(33(X+Y),X)\begin{aligned} \left(\frac{\sqrt{3}}{3}(X+Y),X\right)\end{aligned}也服从二维正态分布,其中ρ33(X+Y),X=0\rho_{\frac{\sqrt{3}}{3}(X+Y),X}=0,因此二者独立