正交最小二乘拟合直线方程公式详细推导

444 阅读1分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。


网上几乎都是用矩阵运算来求解正交最小二乘,未找到直接推导的过程,于是自己尝试推导一下。 点到直线距离误差:d=ax+by+ca2+b2d = {|ax+by+c| \over \sqrt {a^2+b^2}} 误差函数:f(a,b,c)=i=1i=ndi2f(a,b,c) = \sum_{i=1}^{i=n}d^2_i 求误差函数极值点,即一阶偏导等于0,对于 di2d_i^2

di2=(ax+by+c)2a2+b2d_i^2={(ax+by+c)^2 \over a^2+b^2}
di2=a2x2+2abxy+2acx+2bcy+b2y2+c2a2+b2d_i^2={a^2x^2+2abxy+2acx+2bcy+b^2y^2+c^2 \over a^2+b^2}

di2δc=1a2+b2(2ax+2by+2nc){\sum d_i^2 \over \delta c } = {1 \over a^2+b^2}(2a\sum x+2b\sum y+2nc)
2ax+2by+2nc=02a\sum x+2b\sum y+2nc = 0
c=ax+bync=-{a\sum x+b\sum y \over n}

di2δa=2a(a2+b2)2(ax+by+c)2+2(a2+b2)(ax+by+c)x=0{d_i^2 \over \delta a } = {-2a \over (a^2+b^2)^2}(ax+by+c)^2+ {2\over (a^2+b^2)}(ax+by+c)x = 0
2a(ax+by+c)2+2(a2+b2)(ax+by+c)x=0{-2a}(ax+by+c)^2+ 2(a^2+b^2)(ax+by+c)x = 0
2a(a2x2+2abxy+2acx+2bcy+b2y2+c2)+2(a3x+a2by+a2c+ab2x+b3y+b2c)x=0{-2a}(a^2x^2+2abxy+2acx+2bcy+b^2y^2+c^2)+ 2(a^3x+a^2by+a^2c+ab^2x+b^3y+b^2c)x = 0
a3x22a2bxy2a2cx2abcyab2y2ac2+a3x2+a2bxy+a2cx+ab2x2+b3xy+b2xc=0-a^3x^2-2a^2bxy-2a^2cx-2abcy-ab^2y^2-ac^2+ a^3x^2+a^2bxy+a^2cx+ab^2x^2+b^3xy+b^2xc = 0
ab2x2+(a2b+b3)xy+(2a2c+a2c+b2c)x2abcyab2y2ac2=0ab^2x^2+(-a^2b+b^3)xy+(-2a^2c+a^2c+b^2c )x-2abcy-ab^2y^2-ac^2=0

直接求导太过于复杂,化简直线方程,ax+by+c=0ax+by+c=0

  1. 若斜率不存在,等价于x=dx = dxd=0x-d=0,则 b=0,a=1,c=db=0,a=1,c=-d
di2=(xd)2d_i^2={(x-d)^2}
di2δd=2(xnd)=0{\sum d_i^2 \over \delta d }=-2(\sum x-nd)=0
xn=d{\sum x \over n}=d

此时的直线方程:xxn=0x-{\sum x \over n}=0

  1. 若斜率存在,等价于 y=kx+dy =kx+dkx+yd=0-kx+y-d=0,则 a=k,b=1,c=da=-k,b=1,c=-d
di2=(kx+yd)2k2+1d_i^2={(-kx+y-d)^2 \over k^2+1}
di2δk=2k(k2+1)2(kx+yd)2+2(k2+1)(kx+yd)x{d_i^2 \over \delta k} = {-2k \over ( k^2+1)^2}(-kx+y-d)^2+ {-2\over ( k^2+1)}(-kx+y-d)x
di2δk=2(k2+1)2(k(kx+yd)2+(k2+1)(kx+yd)x){d_i^2 \over \delta k} = {-2 \over ( k^2+1)^2}(k (-kx+y-d)^2+ ( k^2+1)(-kx+y-d)x)
di2δk=2(k2+1)2((k3x2+y2k+d2k2k2xy+2k2dx2dyk)+(k2+1)(kx+yd)x){ d_i^2 \over \delta k} = {-2 \over ( k^2+1)^2} \left(( k^3x^2+y^2k+d^2k-2k^2xy+2k^2dx-2dyk)+ ( k^2+1)(-kx+y-d)x \right)
=2(k2+1)2(k3x2+y2k+d2k2k2xy+2k2dx2dykkx2+xydxk3x2+k2xyk2dx)= {-2 \over ( k^2+1)^2} \left(k^3x^2+y^2k+d^2k-2k^2xy+2k^2dx-2dyk-kx^2+xy-dx-k^3x^2+k^2xy-k^2dx \right)
=2(k2+1)2(y2k+d2kk2xy+k2dx2dykkx2+xydx)= {-2 \over ( k^2+1)^2} \left(y^2k+d^2k-k^2xy+k^2dx-2dyk-kx^2+xy-dx \right)
=2(k2+1)2((dxxy)k2+(y2+d22dyx2)k+xydx)= {-2 \over ( k^2+1)^2} \left((dx-xy)k^2 +(y^2+d^2-2dy-x^2)k +xy-dx\right)
di2δk=2(k2+1)2((dxxy)k2+(y2+nd22dyx2)k+xydx){\sum d_i^2 \over \delta k} = {-2 \over ( k^2+1)^2} \left((d\sum x-\sum xy)k^2 +(\sum y^2+nd^2-2d\sum y-\sum x^2)k +\sum xy-d\sum x\right)

di2δd=2k2+1(kx+ynd)=0{\sum d_i^2 \over \delta d} = {-2 \over k^2+1}(-k\sum x+\sum y-nd) = 0
kx+yn=d{-k\sum x+\sum y\over n} = d

2(k2+1)2((dxxy)k2+(y2+nd22dyx2)k+xydx)=0{-2 \over ( k^2+1)^2} \left((d\sum x-\sum xy)k^2 +(\sum y^2+nd^2-2d\sum y-\sum x^2)k +\sum xy-d\sum x\right) = 0
((kx+yn)xxy)k2+(y2+n(kx+yn)22(kx+yn)yx2)k+xy(kx+yn)x=0(({-k\sum x+\sum y\over n})\sum x-\sum xy)k^2 +(\sum y^2+n({-k\sum x+\sum y\over n})^2-2({-k\sum x+\sum y\over n})\sum y-\sum x^2)k +\sum xy-({-k\sum x+\sum y\over n})\sum x = 0
(yxnxy)k2+(ny2yynx2+xx)k+nxyxy=0(\sum y\sum x-n\sum xy)k^2 +(n\sum y^2-\sum y\sum y-n\sum x^2+\sum x\sum x)k +n\sum xy-\sum x\sum y= 0

令:

yxnxy=M\sum y\sum x-n\sum xy = M
ny2yynx2+xx=Nn\sum y^2-\sum y\sum y-n\sum x^2+\sum x\sum x= N
nxyxy=Qn\sum xy-\sum x\sum y= Q

则原式等于:

Mk2+Nk+Q=0Mk^2+Nk+Q = 0

由求根公式得: k=N+N24MQ2Mk = {-N+\sqrt{N^2-4MQ} \over 2M} 此时 M!=0M!=0,否则斜率不存在,为第一种情况。

综上所示: {y+N+N24MQ2Mx+kx+yn,M != 0xxn=0,M=0 \begin{cases} -y+{-N+\sqrt{N^2-4MQ} \over 2M}x+{-k\sum x+\sum y\over n}, \text{M != 0} \\ \\ x-{\sum x \over n}=0, \text{M=0} \\ \end{cases} 撰写不易,转载请注明出处!有错误欢迎指正!