【机器学习】公式推导与证明:支持向量机(SVM)

70 阅读1分钟

确定超平面所需条件

Show that, irrespective of the dimensionality of the data space, a data set consisting of just two data points (call them x(1)x^{(1)} and x(2)x^{(2)}, one from each class) is sufficient to determine the maximum-margin hyperplane. Fully explain your answer, including giving an explicit formula for the solution to the hard margin SVM (i.e., w) as a function of x(1)x^{(1)} and x(2)x^{(2)}.

Answer:

证明给定样本 x(1),x(2)x^{(1)},x^{(2)} ,能求出线性 SVM 分类器的参数即可。

maxαi=1nαi12i,j=1ny(i)y(j)αiαj(x(i))Tx(j)s.t.αi0,i=1,,n,i=1nαiy(i)=0\max_\alpha\sum_{i=1}^n\alpha_i-\frac{1}{2}\sum_{i,j=1}^ny^{(i)}y^{(j)}\alpha_i\alpha_j(x^{(i)})^Tx^{(j)}\\ s.t. \quad \alpha_i \geq 0, i=1,\dots,n,\\ \sum_{i=1}^n \alpha_iy^{(i)}=0

将样本代入可得:

maxα(α1+α212α12x(1)22+α1α2(x(1))Tx(2)12α22x(2)22)s.t.α1>0,α2>0,α1α2=0\max_\alpha(\alpha_1+\alpha_2-\frac{1}{2}\alpha_1^2||x^{(1)}||_2^2+\alpha_1\alpha_2(x^{(1)})^Tx^{(2)}-\frac{1}{2}\alpha_2^2||x^{(2)}||_2^2)\\ s.t. \quad \alpha_1>0, \quad \alpha_2>0, \quad \alpha_1-\alpha_2=0

α1=α2\alpha_1=\alpha_2 代入:

L(α1)=maxα1(2α112α12x(1)22)+α12(x(1))Tx(2)12α12x(2)22)L(\alpha_1)=\max_{\alpha_1}(2\alpha_1-\frac{1}{2}\alpha_1^2||x^{(1)}||_2^2)+\alpha_1^2(x^{(1)})^Tx^{(2)}-\frac{1}{2}\alpha_1^2||x^{(2)}||_2^2)

L(α1)α1=2α1x(1)22+2α1(x(1))Tx(2)α1x(2)22=0\frac{\partial L(\alpha_1)}{\partial\alpha_1}=2-\alpha_1||x^{(1)}||_2^2+2\alpha_1(x^{(1)})^Tx^{(2)}-\alpha_1||x^{(2)}||_2^2=0 ,可得:

α1=2x(1)222(x(1))Tx(2)+x(2)22=2x(1)x(2)22α2=α1=2x(1)x(2)22w=α1x(1)α2x(2)=2(x(1)x(2))x(1)x(2)22b=1(α1(x(1))Tx(1)α2(x(2))Tx(1))=12x(1)x(2)22((x(1))Tx(1)(x(2))Tx(1)))\begin{aligned} \alpha_1&=\frac{2}{||x^{(1)}||_2^2-2(x^{(1)})^Tx^{(2)}+||x^{(2)}||_2^2}=\frac{2}{||x^{(1)}-x^{(2)}||_2^2}\\ \alpha_2&=\alpha_1=\frac{2}{||x^{(1)}-x^{(2)}||_2^2}\\ w^*&=\alpha_1x^{(1)}-\alpha_2x^{(2)}=\frac{2(x^{(1)}-x^{(2)})}{||x^{(1)}-x^{(2)}||_2^2}\\ b&=1-(\alpha_1(x^{(1)})^Tx^{(1)}-\alpha_2(x^{(2)})^Tx^{(1)})\\ &=1-\frac{2}{||x^{(1)}-x^{(2)}||_2^2}((x^{(1)})^Tx^{(1)}-(x^{(2)})^Tx^{(1)})) \end{aligned}

高斯核可以转化为无限维向量内积

Gaussian kernel takes the form:

k(x,x)=exp(xx22σ2)k(x,x')=\exp(-\frac{||x-x'||^2}{2\sigma^2})

Try to show that the Gaussian kernel can be expressed as the inner product of an infinite-dimensional feature vector.

Hint: Making use of the following expansion, and then expanding the middle factor as a power series.

k(x,z)=exp(xTx2σ2)exp(xTzσ2)exp((z)Tz2σ2)k(x,z)=\exp(-\frac{x^Tx}{2\sigma^2})\exp(\frac{x^Tz}{\sigma^2})\exp(-\frac{(z)^Tz}{2\sigma^2})

Answer:

将中间项 exp(xTzσ2)\exp(\frac{x^Tz}{\sigma^2}) 用泰勒级数展开:

exp(xTzσ2)=exp(i=1dxiziσ2)=n=0(xTzσ2)nn!=(1,11!,12!,)((xTzσ2)0,(xTzσ2)1,(xTzσ2)2,)T\begin{aligned} \exp(\frac{x^Tz}{\sigma^2})&=\exp(\frac{\sum_{i=1}^dx_iz_i}{\sigma^2})=\sum_{n=0}^{\infty}\frac{(\frac{x^Tz}{\sigma^2})^n}{n!}\\ &=(1,\frac{1}{1!},\frac{1}{2!},\cdots)((\frac{x^Tz}{\sigma^2})^0,(\frac{x^Tz}{\sigma^2})^1,(\frac{x^Tz}{\sigma^2})^2,\cdots)^T \end{aligned}

代入 k(x,z)k(x,z) 得:

k(x,z)=exp(xTx2σ2)[(1,11!,12!,)((xTzσ2)0,(xTzσ2)1,(xTzσ2)2,)T]exp((z)Tz2σ2)=[exp(xTx2σ2)(1,11!,12!,)][exp((z)Tz2σ2)((xTzσ2)0,(xTzσ2)1,(xTzσ2)2,)]T\begin{aligned} k(x,z)&=\exp(-\frac{x^Tx}{2\sigma^2})\left[(1,\frac{1}{1!},\frac{1}{2!},\cdots)((\frac{x^Tz}{\sigma^2})^0,(\frac{x^Tz}{\sigma^2})^1,(\frac{x^Tz}{\sigma^2})^2,\cdots)^T\right]\exp(-\frac{(z)^Tz}{2\sigma^2})\\ &=\left[\exp(-\frac{x^Tx}{2\sigma^2})(1,\frac{1}{1!},\frac{1}{2!},\cdots)\right]\left[\exp(-\frac{(z)^Tz}{2\sigma^2})((\frac{x^Tz}{\sigma^2})^0,(\frac{x^Tz}{\sigma^2})^1,(\frac{x^Tz}{\sigma^2})^2,\cdots)\right]^T \end{aligned}