《线性估计》 Chapter3 随机性最小二乘问题(Stochastic Least-Squares Problems )

910 阅读7分钟

CHAPTER 3 Stochastic Least-Squares Problems

3.1提出问题

3.2给出一个简单的结果

3.3给出几何解释

3.4开始讲 线性 方法

3.5给出随机性最小二乘和确定性最小二乘的等价关系和对偶关系

3.1 随机估计问题THE PROBLEM OF STOCHASTIC ESTIMATION

给出两个独立随机变量x,y,y为已知,x为未知。

problem1:由y的一个观测值y估计出x的一个观测值x_hat:(3.1.1)

\hat{x}=h(y)

或者更一般可以说,由随机变量y来估计出随机变量x:(3.1.2)

\hat{\mathbf{x}}=h(\mathbf{y})

引出problem2:h()函数如何确定?

要找到一个满足optimality criterion的估计方法:the least-mean-squares criterion(和C2中的least-squares criterion类似)

通过解条件期望:(推导见3.A.1)

\hat{\mathbf{x}}=E[\mathbf{x} | \mathbf{y}]

需要用到联合概率分布,很难求。所以我们简化一下:

让h()为一个线性函数。我们还可能指出,当{x,y}为联合高斯分布时,通常是合理的假设,则无约束的最小均方估计量是线性的。

3.2线性最小均方估计LINEAR LEAST-MEAN-SQUARES ESTIMATORS

对于x和y这样的复数值和零均值随机变量,我们将协方差矩阵定义为cov(x,y)= Exy^{*} (-期望的乘积那一项因为他们各自期望为0所以为0),其中表示标量随机变量的复共轭和复共轭转置 (即所谓的Hermitian转置)用于向量值的随机变量。

此定义的主要原因是要确保var(x)= cov(x,x)= Exx^{ *},当x为标量时为非负标量,当x为向量随机变量时为非负定矩阵

3.2.1基本方程式

我们的目标是在给定随机变量{yi}假定某些值{yi}的情况下估计由随机变量x假定的值。 我们对x的线性估计量感兴趣,即对随机变量{yi} 进行线性运算的估计量.

假定估计值可以由参数矩阵K0(还未知)得出:

\hat{\mathbf{x}}=K_{o} \mathbf{y}

此时得到的最优估计值就可以使误差协方差矩阵达到最小

P\left(K_{o}\right) \triangleq E[\mathbf{x}-\hat{\mathbf{x}}][\mathbf{x}-\hat{\mathbf{x}}]^{*}=\text { minimum. }

注:矩阵的数学期望是矩阵, 矩阵中的元素是矩阵中每一元素的期望.

Theorem 3.21 (Optimal Linear L.M.S. Estimators) Given two complex zero mean random variables \mathbf{x} and \mathbf{y}, the L. Im.s. estimator of \mathbf{x} given \mathbf{y}, defined by (3.2 .1)-(3.2 .2) is given by any solution K_{o} of the so-called normal equations

\boldsymbol{K}_{\boldsymbol{o}} \boldsymbol{R}_{\boldsymbol{y}}=\boldsymbol{R}_{\boldsymbol{x y}}

where R_{y}=E \mathbf{y} \mathbf{y^{*}} and R_{x y}=E \mathbf{x y}^{*}=R_{y x}^{*} . The corresponding minimum-mean-square-error matrix (or error covariance matrix) is

P\left(K_{o}\right)=R_{x}-K_{o} R_{y x}=R_{x}-R_{x y} K_{o}^{*}

证明: K_{o} is a solution of the optimization problem (3.2.1)-(3.2.2) if, and only if, for all vectors a, a \ddot{K}_{o} is a minimum of a P(K) a^{*}, where

a P(K) a^{*}=a E\left[(\mathbf{x}-K y)(\mathbf{x}-K y)^{*}\right] a^{*}=a\left[R_{x}-R_{x y} K^{*}-K R_{y x}+K R_{y} K^{*}\right] a^{*}

Note that a P(K) a^{*} is a scalar function of a complex-valued (row) vector quantity aK. Then (see App. A.6) differentiating a P(K) a^{*} with respect to a K and setting the derivative equal to zero at K=K_{o} leads to the equations R_{x y}=K_{o} R_{y} . The corresponding minimum-mean-square-error (or, m.m.s.e. for short) matrix is

\begin{aligned}
\mathbf{m . m . s . e .} \triangleq P\left(K_{o}\right) &=E(\mathbf{x}-\hat{\mathbf{x}})(\mathbf{x}-\hat{\mathbf{x}})^{*}=E(\mathbf{x}-\hat{\mathbf{x}}) \mathbf{x}^{*}-E(\mathbf{x}-\hat{\mathbf{x}}) \hat{\mathbf{x}}^{*} \\
&=E\left(\mathbf{x}-K_{o} \mathbf{y}\right) \mathbf{x}^{*}-E\left(\mathbf{x}-K_{o} \mathbf{y}\right) \mathbf{y}^{*} K_{o}^{*} \\
&=R_{x}-K_{o} R_{y x}-\left(R_{x y}-K_{o} R_{y}\right) K_{o}^{*}=R_{x}-K_{o} R_{y x}
\end{aligned}

其中有? a P(K) a^{} \geq a P\left(K_{o}\right) a^{} ? for every K and for every row vector a. The solution to the above problem is given by the following theorem.

也即K_{o} also minimizes the mean-square error in the estimator of each component of the vector \mathbf{x}.

Theorem 3.2.2 (Unique Solutions) Assume that R_{y}>0 . (R_y正定方阵,必可逆)Then the optimum choice \boldsymbol{K}_{\boldsymbol{o}} that minimizes P(\boldsymbol{K})=\boldsymbol{E}[\mathbf{x}-\boldsymbol{K} \mathbf{y}][\mathbf{x}-\boldsymbol{K} \mathbf{y}]^{*} is given by

K_{o}=R_{x y} R_{y}^{-1}

The m. m.s.e. (see (3.2 .5) ) can be written as

P\left(K_{o}\right)=R_{x}-R_{x y} R_{y}^{-1} R_{y x} \triangleq R_{\tilde{x}}

K is always nonnegative (because Ry > 0)。

3.2.2Stochastic Interpretation of Triangular Fadorization (LDL,UDU,舒尔补)

从分块矩阵、三角矩阵、舒尔补的角度来看上一节推导出的结果:(另一种简便的求解P(K_{0})的方法

assuming R_{y}>0, i.e., P(K_{0})=R_{\tilde{x}}=R_{x}-R_{x y} R_{y}^{-1} R_{y x}, is the 舒尔补 of R_{y} in the joint covariance matrix:

E\left[\begin{array}{l} \mathbf{x} \\ \mathbf{y} \end{array}\right]\left[\begin{array}{ll} \mathbf{x}^{*} & \mathbf{y}^{*} \end{array}\right]=\left[\begin{array}{ll} R_{x} & R_{x y} \\ R_{y x} & R_{y} \end{array}\right]

对于这个joint covariance matrix,可以很方便地进行LDL^{*}分解和UDU^{*}分解

When R_{y}>0, the UDU* decomposition如下:

\left[\begin{array}{cc}
R_{x} & R_{x y} \\
R_{y x} & R_{y}
\end{array}\right]=\left[\begin{array}{cc}
I & R_{x y} R_{y}^{-1} \\
0 & I
\end{array}\right]\left[\begin{array}{cc}
R_{x}-R_{x y} R_{y}^{-1} R_{y x} & 0 \\
0 & R_{y}
\end{array}\right]\left[\begin{array}{cc}
I & 0 \\
R_{y}^{-1} R_{y x} & I
\end{array}\right]

follows from the representation of the pair \{\mathbf{x}, \mathbf{y}\} of correlated random variables in terms of the obviously uncorrelated pair \left\{\tilde{\mathbf{x}}_{| y}, y\right\}, Whese \tilde{x}_{| y}=x-R_{x y} R_{y}^{-1} y:(error)

\left[\begin{array}{l}
\mathbf{x} \\
\mathbf{y}
\end{array}\right]=\left[\begin{array}{lc}
I & R_{x y} R_{y}^{-1} \\
0 & I
\end{array}\right]\left[\begin{array}{l}
\tilde{x}_{1 y} \\
y
\end{array}\right]

扩展到从多个随机变量估计一个随机变量的情况:y,z -> x

E\left(\left[\begin{array}{l}\mathbf{x} \\ \mathbf{y} \\ \mathbf{z}\end{array}\right]\left[\begin{array}{l}\mathbf{x} \\ \mathbf{y} \\ \mathbf{z}\end{array}\right]^{*}\right)=\left[\begin{array}{ccc}R_{x} & R_{x y} & 0 \\ R_{y x} & R_{y} & R_{y z} \\ 0 & R_{z y} & R_{z}\end{array}\right]

and verify that the covariance matrix of the error in estimating y given both x and z is given by

E \tilde{\mathbf{y}}\tilde{\mathbf{y}}_{|\mathbf{x}, \mathbf{z}| \mathbf{x}, \mathbf{z}}^{*} \triangleq R_{\tilde{y}_{1 \mathbf{x}, z}}^{*}=R_{y}-R_{y x} R_{x}^{-1} R_{x y}-R_{y z} R_{z}^{-1} R_{z y}=R_{\bar{y}_{x}}+R_{\bar{y}_{z}}-R_{y}

这个结果对于合并基于不同观测值的估计量的问题很有用,我们将在 sec.3.4.3和 prob.3.23。

3.2.3奇异数据的协方差矩阵Singular Data Covariance Matrices

先前我们都是假设R_{y}>0,Ry正定 = = Ry可逆矩阵==Ry向量线性无关。现在我们假设Ry是个奇异矩阵,然后通过...证明这条假设是矛盾的(在复数下?),所以Ry就是正定矩阵。

如果我们这里强行假设Ry为奇异矩阵,我们想知道normal function的解的情况。讨论了R_{y}>0的不必要性,不>0虽然normal function有多个解,但是最优估计和cost function都是唯一的,见如下定理。

Theorem 32.3 (Non-unique Solutions) Even if R_{y}=E y y^{*} is singular, the normal equations K_{o} R_{y}=R_{x y} will be consistent, and there will be many solutions. No matter which solution K_{o} is used, the corresponding l. l.m.s. estimator \hat{\mathbf{x}}=K_{o} \mathbf{y} will, however, be unique, and so of course will P\left(K_{o}\right)

3.2.4非零均值and居中Nonzero-Mean Values and Centering

前面的讨论以及normal function的解都是在x和y的均值/期望为0的条件下进行的。

现在如果x和y的均值/期望不为0,我们则通过Centering的方法进行仿射变换。

\mathbf{x}^{o} \triangleq \mathbf{x}-m_{x}, \quad \mathbf{y}^{o} \triangleq \mathbf{y}-m_{y}

他们的协方差/互协方差矩阵为如下:

E \mathbf{x}^{o} \mathbf{x}^{o *}=E\left(\mathbf{x}-m_{x}\right)\left(\mathbf{x}-m_{x}\right)^{*}=E \mathbf{x} \mathbf{x}^{*}-m_{x} m_{x}^{*} \triangleq R_{x}, the covariance matrix of x

E \mathbf{x}^{o} \mathbf{y}^{o *}=E\left(\mathbf{x}-m_{x}\right)\left(\mathbf{y}-m_{y}\right)^{*}=E \mathbf{x} \mathbf{y}^{*}-m_{x} m_{y}^{*} \triangleq R_{x y}, \quad the cross-covariance matrix of \mathbf{x} and \mathbf{y}

最优估计值为:

\hat{\mathbf{x}}^{o}=\left(E \mathbf{x}^{o} \mathbf{y}^{o *}\right)\left(E \mathbf{y}^{o} \mathbf{y}^{o *}\right)^{-1} \mathbf{y}^{o}

或等价为

\begin{aligned}
\hat{\mathbf{x}} &=m_{x}+R_{x y} R_{y}^{-1}\left(\mathbf{y}-m_{y}\right) \\
&=R_{x y} R_{y}^{-1} \mathbf{y}+\left(m_{x}-R_{x y} R_{y}^{-1} m_{y}\right)
\end{aligned}

我们可以看到,严格来说,给定y的x的线性最小均方估计量实际上是y的仿射函数,而不是线性函数。 然而,很容易为继续将x称为y的线性函数而辩解。

3.2.5复数值随机变量的估计量Estimators for Complex-Valued Random Variables

复数值随机变量的估计有两种方法:不管我们是把问题简化为实数随机变量来做,还是用复杂的复数随机变量来做,normal function还是相同的;唯一的区别是,我们可能正在使用给定的向量{x,y}或它们的扩展版本{xR,xI,yR,yI}。

3.3几何观点A GEOMETRIC FORMULATION

3.3.1正交条件

\hat{\mathbf{x}}=K_{o} \mathbf{y}
K_{o} R_{y}=R_{x y}
E\left(\mathbf{x}-K_{o} y\right) \mathbf{y}^{*}=0?(3.3.2)

由这个式子可以联想到Chap.2中正交的概念:**两个向量正交,则他们的内积inner product为零**。

内积必须满足以下条件:

1. Linearity: $\left\langle a_{1} \mathbf{x}_{1}+a_{2} \mathbf{x}_{2}, \mathbf{y}\right\rangle= a_{1}\left\langle\mathbf{x}_{1}, \mathbf{y}\right\rangle+ a_{2}\left\langle\mathbf{x}_{2}, \mathbf{y}\right\rangle,$ for any $a_{1}, a_{2} \in \mathbf{C}$
2. Reflexivity: $(\mathbf{x}, \mathbf{y})=\langle\mathbf{y}, \mathbf{x}\rangle^{*}$
3. Nondegeneracy: $\|\mathbf{x}\|^{2} \triangleq\langle\mathbf{x}, \mathbf{x})$ is zero only when $\mathbf{x}=0$

下面我们就根据(3.3.2)的特点和内积的条件,定义如下的随机变量的内积:(将x,y随机变量看作向量)

?\langle\mathbf{x}, \mathbf{y}\rangle \triangleq E \mathbf{x} \mathbf{y}^{*}

那么(3.3.2)就可以用几何观点来看:

对比一下确定性最小二乘的几何观点:

注意:投影平面是不一样的!!figure3.1中?K_{0}?是系数,figure2.1中\hat{x}是系数

用引理来说明几何观点:

Lemma 3.3.1 (The Orthogonality Condition) The linear least-mean-squares estimator (LLm.se) of a random variable x given a set of other random variables y is characterized by the fact that the error \tilde{\mathbf{x}} in the estimator is orthogonal to (i.e., uncorrelated with) each of the random variables used to form the estimator. Equivalently, the LL.m.se. is the projection of x onto \mathcal{L}(\mathbf{y})

Projection onto the linear space \mathcal{L}(\mathbf{y}) (which we denote here by \hat{\mathbf{x}}_{| \mathbf{y}} ) has the important properties

\left(\mathbf{x}_{1} \widehat{+} \mathbf{x}_{2}\right)_{| \mathbf{y}}=\hat{\mathbf{x}}_{1 | \mathbf{y}}+\hat{\mathbf{x}}_{2 | \mathbf{y}}

and

\hat{\mathbf{x}}_{| \mathbf{y}_{1}, \mathbf{y}_{2}}=\hat{\mathbf{x}}_{| \mathbf{y}_{1}}+\hat{\mathbf{x}}_{| \mathbf{y}_{2}} \text { if, and only if } \mathbf{y}_{1} \perp \mathbf{y}_{2}

These geometrically intuitive properties can be formally verified by using the explicit formula \hat{\mathbf{x}}_{| \mathbf{y}}=\langle\mathbf{x}, \mathbf{y}\rangle\|\mathbf{y}\|^{-2} \mathbf{y}

3.3.2例子

3.4线性模型

An extremely important special case that will often arise in our analysis occurs when y and \mathbf{x} are linearly related, say as

\mathbf{y}=H \mathbf{x}+\mathbf{v}

where H \in \mathbb{C}^{p \times n} is a known matrix and \mathbf{v} is a zero-mean random-noise vector uncorrelated with \mathbf{x}. Assume that R_{x}=\langle\mathbf{x}, \mathbf{x}\rangle and R_{v}=\langle\mathbf{v}, \mathbf{v}\rangle are known and also that. R_{y}=H R_{x} H^{*}+R_{v}>0 . Then the l.l.m.s.e. and the corresponding m.m.s.e. can be written as

\hat{\mathbf{x}}=K_{o} \mathbf{y}, \quad K_{o}=R_{x} H^{*}\left[H R_{x} H^{*}+R_{v}\right]^{-1}

and

P_{x} \triangleq R_{\tilde{x}}=R_{x}-R_{x} H^{*}\left[R_{\nu}+H R_{x} H^{*}\right]^{-1} H R_{x}

These formulas will be encountered in many different contexts in later chapters.

3.4.1 Rx> 0和Rv> 0时的information forms

We may remark that formulas using inverses of covariance matrices are sometimes called Information Form results, because loosely speaking the amount of information obtained by observing a random variable varies inversely as its variance.

使用矩阵求逆定理:

R_{y}^{-1}=\left(R_{v}+H R_{x} H^{*}\right)^{-1}=R_{v}^{-1}-R_{v}^{-1} H\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)^{-1} H^{*} R_{v}^{-1}

重新表示:

\begin{aligned} K_{o} &=R_{x} H^{*} R_{v}^{-1}-R_{x} H^{*} R_{v}^{-1} H\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)^{-1} H^{*} R_{v}^{-1} \\ &=R_{x}\left[\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)-H^{*} R_{v}^{-1} H\right]\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)^{-1} H^{*} R_{v}^{-1} \\ &=\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)^{-1} H^{*} R_{v}^{-1} \end{aligned}

P_{x}=R_{x}-R_{x} H^{*}\left[R_{v}+H R_{x} H^{*}\right]^{-1} H R_{x}=\left(R_{x}^{-1}+H^{*} R_{v}^{-1} H\right)^{-1}

有如下的nice fomula

P_{x}^{-1} \hat{\mathbf{x}}=H^{*} R_{v}^{-1} \mathbf{y}

3.4.2高斯-马尔可夫定理

3.4.3组合估计器Combining Estimators

用多个随机变量来估计未知随机变量:

Lemma 3.4.1 (Combining Estimators) Let \mathbf{y}_{a} and \mathbf{y}_{b} be two separate observations of a zero-mean random variable \mathbf{x}, such that \mathbf{y}_{a}=H_{a} \mathbf{x}+\mathbf{v}_{a} and \mathbf{y}_{b}=H_{b} \mathbf{x}+\mathbf{v}_{b} where \left\{\mathbf{v}_{a}, \mathbf{v}_{b}, \mathbf{x}\right\} are mutually uncorrelated zero-mean random variables with covariance matrices R_{a}, R_{b}, and R_{x}, respectively. Denote by \hat{\mathbf{x}}_{a} and \hat{\mathbf{x}}_{b} the L.l.m.s. estimators of \mathbf{x} given \mathbf{y}_{a} and \mathbf{y}_{b}, respectively, and likewise define the error covariance matrices. P_{a}=\left\langle\mathbf{x}-\hat{\mathbf{x}}_{a}, \mathbf{x}-\hat{\mathbf{x}}_{a}\right\rangle and P_{b}=\left\langle\mathbf{x}-\hat{\mathbf{z}}_{b}, \mathbf{x}-\hat{\mathbf{x}}_{b}\right\rangle . Then \hat{\mathbf{x}}, the L L m . s estimator of \mathbf{x} given both \mathbf{y}_{\boldsymbol{a}} and \mathbf{y}_{\boldsymbol{b}}, can be found as

P^{-1} \hat{\mathbf{x}}=P_{a}^{-1} \hat{\mathbf{x}}_{a}+P_{b}^{-1} \hat{\mathbf{x}}_{b}

(3.4 .12) where P, the corresponding error covariance matrix, is given by

P^{-1}=P_{a}^{-1}+P_{b}^{-1}-R_{x}^{-1}

♥3.5与确定性最小二乘的等价性

Appendix for Chapter 3

3.A最小均方估计LEAST-MEAN-SQUARES ESTIMATION

在本附录中,我们考虑一个更普遍的问题,即确定一个可能的非线性函数h(·),该函数在一个随机变量x的最小均方意义上根据另一个随机变量y的观测值作为最佳估计.

定义一个error 随机变量:

\tilde{\mathbf{x}}=\mathbf{x}-\hat{\mathbf{x}}=\mathbf{x}-h(\mathbf{y})

The least-mean-squares criterion minimizes the "variance" of the error variable:

cost function:(结果是(误差)协方差矩阵)

\min _{h()} E(\tilde{\mathbf{x}} \tilde{{\mathbf{x}}^{*}})

Theorem 3.A.1 (The Optimal Least-Mean-Squares Estimator) The optimal leastmean-squares ( l.m.s. ) estimator (cf. (3.4 .1) ) of a random variable \mathbf{x} given the value of another random variable y is given by the conditional expectation

\hat{\mathbf{x}}=E(\mathbf{x} | \mathbf{y})

In particular, if \mathbf{x} and \mathbf{y} are independent random variables, then the optimal estimator of \mathbf{x} is \hat{\mathbf{x}}=E(\mathbf{x} | \mathbf{y})=E(\mathbf{x})

3.B高斯随机变量

3.C高斯变量的最佳估计