详解支持向量机-硬间隔SVM-模型求解-引出对偶问题&引出KKT条件【白板推导系列笔记】

26 阅读1分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 10 月更文挑战」的第12天,点击查看活动详情

{min ω,b12ωTωs.t.yi(ωTxi+b)11yi(ωTxi+b)0,i=1,2,,NN个约束 \left\{\begin{aligned}&\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega\\&s.t.y_{i}(\omega^{T}x_{i}+b)\geq 1\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0,\underbrace{i=1,2,\cdots,N}_{N个约束}\end{aligned}\right.

构建拉格朗日函数

L(ω,b,λ)=12ωTω+i=1Nλi[1yi(ωTxi+b)] L(\omega,b,\lambda)=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]

注意这里LL括号里面的λ\lambdaN×1N \times 1,等号右边的λi\lambda_{i}1×11 \times 1

 

拉格朗日乘子法具体后面文章会解释

 

例如本题,如果1yi(ωTxi+b)>01-y_{i}(\omega^{T}x_{i}+b)>0

max λL(λ,ω,b)=12ωTω+= \mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+ \infty=\infty

如果1yi(ωTxi+b)01-y_{i}(\omega^{T}x_{i}+b)\leq 0

max λL(λ,ω,b)=12ωTω+0=12ωTω \mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\frac{1}{2}\omega^{T}\omega+0=\frac{1}{2}\omega^{T}\omega

因此有

min ω,bmax λL(λ,ω,b)=min ω,b(,12ωTω)=min ω,b12ωTω \mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\lambda,\omega,b)=\mathop{\text{min }}\limits_{\omega,b}(\infty, \frac{1}{2}\omega^{T}\omega)=\mathop{\text{min }}\limits_{\omega,b} \frac{1}{2}\omega^{T}\omega

因此该问题的无约束形式为

min ω,bmax λL(ω,b,λ),s.t.λi0 \mathop{\text{min }}\limits_{\omega,b}\mathop{\text{max }}\limits_{\lambda}L(\omega,b,\lambda),s.t.\lambda_{i}\geq 0

 

这里的有无约束指的是对ω\omega的约束(这里的ω\omega相当于模板中的xx)。本来11yi(ωTxi+b)01\Leftrightarrow 1-y_{i}(\omega^{T}x_{i}+b)\leq 0是对ω\omega的约束,通过拉格朗日函数将约束条件转化为λi0\lambda_{i}\geq 0,是对λi\lambda_{i}的约束,不再是对ω\omega的约束,因此称为无约束形式

 

由于不等式约束是仿射函数,对偶问题和原问题等价,因此该问题的对偶形式为

max λmin ω,bL(ω,b,λ)s.t.λi0 \mathop{\text{max }}\limits_{\lambda}\mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda)s.t.\lambda_{i}\geq 0

先看min ω,bL(ω,b,λ)\mathop{\text{min }}\limits_{\omega,b}L(\omega,b,\lambda),对于bb

Lb=b[i=1Nλii=1Nλiyi(ωTxi+b)]=b(i=1Nλiyib)=i=1Nλiyi=0 \begin{aligned} \frac{\partial L}{\partial b}&=\frac{\partial }{\partial b}\left[\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\right]\\ &=\frac{\partial }{\partial b}\left(-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\right)\\ &=-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0 \end{aligned}

将其代入L(ω,b,λ)L(\omega,b,\lambda)

L(ω,b,λ)=12ωTω+i=1Nλii=1Nλiyi(ωTxi+b)=12ωTω+i=1Nλii=1NλiyiωTxi+i=1Nλiyib=12ωTω+i=1Nλii=1NλiyiωTxi \begin{aligned} L(\omega,b,\lambda)&=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}(\omega^{T}x_{i}+b)\\ &=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i}+\sum\limits_{i=1}^{N}\lambda_{i}y_{i}b\\ &=\frac{1}{2}\omega^{T}\omega+\sum\limits_{i=1}^{N}\lambda_{i}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\omega^{T}x_{i} \end{aligned}

对于ω\omega

Lω=122ωi=1Nλiyixi=0ω=i=1Nλiyixi \begin{aligned} \frac{\partial L}{\partial \omega}&=\frac{1}{2} \cdot 2\omega- \sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}=0\\ \omega&=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i} \end{aligned}

将其代入L(ω,b,λ)L(\omega,b,\lambda)

L(ω,b,λ)=12(i=1Nλiyixi)T(j=1Nλjyjxj)Ri=1Nλiyi(j=1Nλjyjxj)TxiR+i=1Nλi=12i=1Nj=1NλiλjyiyjxiTxj+i=1Nλi \begin{aligned} L(\omega,b,\lambda)&=\frac{1}{2}\underbrace{\left(\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}\right)^{T}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)}_{\in \mathbb{R}}-\underbrace{\sum\limits_{i=1}^{N}\lambda_{i}y_{i}\left(\sum\limits_{j=1}^{N}\lambda_{j}y_{j}x_{j}\right)^{T}x_{i}}_{\in \mathbb{R}}+\sum\limits_{i=1}^{N}\lambda_{i}\\ &=- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}^{T}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i} \end{aligned}

因此原问题转化为

max λ12i=1Nj=1Nλiλjyiyjxixj+i=1Nλi,s.t.λi0,i=1Nλiyi=0 \mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0

定义该优化问题的KKT条件(由于原、对偶问题具有强对偶关系\Leftrightarrow满足KKT条件)

{Lω=0,Lb=0λi[1yi(ωTxi+b)]=0λi01yi(ωTxi+b)=0 \left\{\begin{aligned}&\frac{\partial L}{\partial \omega}=0,\frac{\partial L}{\partial b}=0\\&\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0\\&\lambda_{i}\geq 0\\&1-y_{i}(\omega^{T}x_{i}+b)=0\end{aligned}\right.

其中λi[1yi(ωTxi+b)]=0\lambda_{i}[1-y_{i}(\omega^{T}x_{i}+b)]=0叫做互补松弛条件,因为对于支持向量yi(ωTxi+b)=1y_{i}(\omega^{T}x_{i}+b)= 1,对于其他数据点λi=0\lambda_{i}=0(根据拉格朗日函数的定义),即二者中一定至少有一个为00

在之前的Lω\begin{aligned} \frac{\partial L}{\partial \omega}\end{aligned}中我们可以得到

ω=i=1Nλiyixi \omega ^{*}=\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}

 

其中λi\lambda_{i}通过求解max λ12i=1Nj=1Nλiλjyiyjxixj+i=1Nλi,s.t.λi0,i=1Nλiyi=0\begin{aligned} \mathop{\text{max }}\limits_{\lambda}- \frac{1}{2}\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N}\lambda_{i}\lambda_{j}y_{i}y_{j}x_{i}x_{j}+\sum\limits_{i=1}^{N}\lambda_{i},s.t.\lambda_{i}\geq 0,\sum\limits_{i=1}^{N}\lambda_{i}y_{i}=0\end{aligned}可以得到

有看到怎么求,但是还是看不懂,反正就是能得到λi\lambda_{i}就对了

 

对于bb ^{*},我们假设

(xk,yk),s.t.1yk(ωTxk+b)=0 \exists (x_{k},y_{k}),s.t.1-y_{k}(\omega^{T}x_{k}+b)=0

显然(xk,yk)(x_{k},y_{k}),就是所谓的支持向量(在最开始ωT=ω^Ta,b=b^a\begin{aligned} \omega^{T}=\frac{\hat{\omega}^{T}}{a},b=\frac{\hat{b}}{a}\end{aligned}那里设的)。因此

yk(ωTxk+b)=1yk2(ωTxk+b)=ykyk{+1,1},yk2=1b=ykωTxk=yki=1NλiyixiTxk \begin{aligned} y_{k}(\omega^{T}x_{k}+b)&=1\\ y_{k}^{2}(\omega^{T}x_{k}+b)&=y_{k}\\ &y_{k}\in \left\{+1,-1\right\},y_{k}^{2}=1\\ b ^{*}&=y_{k}-\omega^{T}x_{k}=y_{k}-\sum\limits_{i=1}^{N}\lambda_{i}y_{i}x_{i}^{T}x_{k} \end{aligned}

 

这里的(xk,yk)(x_{k},y_{k})就是λi\lambda_{i}对应的向量,至于λi\lambda_{i}怎么求,还是那句话我不会,非常抱歉