SVM 支持向量机的一些理解

191 阅读4分钟

1.拉格朗日乘子法

1. 拉格朗日函数

凸优化:凸优化问题是指优化问题的目标函数是凸函数,而约束集合是凸集。在这样的问题中,局部最优解同时也是全局最优解

对于优化问题

minx f(x)s.t.gi(x)0,i=1,2,...,mhj(x)=0,j=1,2,...n\underset{x}{min}\space f(x) \\ s.t. g_{i}(x) \le 0, i = 1,2,...,m \\ h_{j}(x) = 0, j=1,2,...n

拉格朗日函数定义为

L(x,λ,μ)=f(x)+i=1mλigi(x)+j=1nμjhj(x)L(x,\lambda,\mu) = f(x) + \sum\limits_{i=1}^{m} \lambda_i g_i(x) + \sum\limits_{j=1}^{n} \mu_j h_j(x)

2. minmax 不等式

maxy minx f(x,y)minx maxy f(x,y)\underset{y}{max} \space \underset{x}{min} \space f(x,y) \le \underset{x}{min} \space \underset{y}{max} \space f(x,y)

实际上对于任意 x0,y0x_0, y_0 ,下列式子恒成立

 minx f(x,y0)f(x0,y0) maxy f(x0,y) \space \underset{x}{min} \space f(x,y_0) \le f(x_0,y_0) \le\space \underset{y}{max} \space f(x_0,y)

去掉中间部分再分别取最大值和最小值可以证明上述不等式

3. 拉格朗日函数的原问题和对偶问题
1.弱对偶

对于L(x,λ,μ)=f(x)+i=1mλigi(x)+j=1nμjhj(x)L(x,\lambda,\mu) = f(x) + \sum\limits_{i=1}^{m} \lambda_i g_i(x) + \sum\limits_{j=1}^{n} \mu_j h_j(x)

maxλ0,μ L(x,λ,μ)={ f(x),满足约束时 +不满足约束时\underset{\lambda\ge 0, \mu}{max} \space L(x,\lambda, \mu) =\left\{\begin{matrix}\space f(x),& 满足约束时\\ \space + \infty, & 不满足约束时\end{matrix}\right.

如果x满足约束,那么L(x,λ,μ)L(x,\lambda, \mu)右面两组约束都为0。 对于μigi(x)\mu_ig_i(x) 满足约束的时候显然为0

对于λihi(x)\lambda_i h_i(x) 由于hi(x)0h_i(x) \le 0 所以取最大值的状态只能为0

如果不满足约束 λihi(x)>0\lambda_i h_i(x) > 0 ,适当选取μi\mu_i的时候 μig(x)>0\mu_ig(x) >0 取最大都会趋向于正无穷

满足约束时,两边对x取最小值之后则有

minx maxλ0,μ L(x,λ,μ)=minx f(x)\underset{x}{min}\space\underset{\lambda \ge 0,\mu}{max} \space L(x,\lambda,\mu) = \underset{x}{min} \space f(x)

成立,都称为原始问题, 原始问题与对偶问题的关系如下

minx maxλ0,μ L(x,λ,μ)maxλ0,μ minxL(x,λ,μ)\underset{x}{min}\space\underset{\lambda \ge 0,\mu}{max} \space L(x,\lambda,\mu) \ge \underset{\lambda\ge0,\mu}{max}\space \underset{x}{min} L(x,\lambda,\mu)

2.强对偶

当然如果这俩能取等是最好的,可以取等号的时候是强对偶,这样可以解决很多问题

SlaterSlater 条件给出了强对偶的充分条件

fi(x)\forall f_i(x) 都为凸函数

 xrelint D\exist \space x \in relint \space D 满足 fi(x)<0f_i(x) < 0

KKTKKT 条件给出了强对偶的必要条件

gi(x)0g_i(x) \le 0

hi(x)=0h_i(x) = 0

xL(x,λ,μ)=0\nabla_x L(x,\lambda,\mu) = 0

λi0\lambda_i \ge 0

λigi(x)=0\lambda_i g_i(x) = 0

2. 超平面的表示与计算分隔平面

平面可用 wTx+b=0w^T x + b = 0 表述

距离 d=wTx+bwd = \frac{w^Tx + b}{\parallel w \parallel} , 对于分类边界也可转化为 wTx+bwd=1 \frac{w^Tx + b}{\parallel w\parallel d} = 1

实际上可以得到新的w,bw, b 满足 wTx+b=1w^T x + b = 1

分类时{ wTx+b1,yi=1 wTx+b1yi=1\left\{\begin{matrix}\space w^Tx+b \ge1,& y_i =1\\ \space w^Tx + b \le -1, & y_i=-1\end{matrix}\right.

两平面之间的距离

d=2wd = \frac{2}{\parallel w \parallel}

为了容错性,需要找到最大间距

maxw,b2ws.t.yi(wTx+b)1\underset{w,b}{max} \frac{2}{\parallel w\parallel} \\ s.t. y_i(w^Tx + b)\ge 1

等价于

minw,b12w2s.t.yi(wTx+b)1\underset{w,b}{min} \frac{1}{2}{\parallel w\parallel} ^2 \\ s.t. y_i(w^Tx + b)\ge 1

构造拉格朗日函数

L(w,b,α)=12w2=+i=1mαi(1yi(wTx+b))L(w,b,\alpha) = \frac{1}{2} {\parallel w \parallel }^2 =+\sum\limits_{i=1}^{m} \alpha_i(1 - y_i(w^Tx + b))

3. SMO (Squential Minimal Optimization)

1. 对偶问题

L(w,b,α)L(w,b,\alpha)w,b,αiw, b, \alpha_i偏导

Lw=wi=1mαiyixi=0\frac{\partial L}{\partial w} = w - \sum\limits_{i=1}^{m} \alpha_iy_ix_i = 0

Lb=i=1mαiyi=0\frac{\partial L}{\partial b} = \sum\limits_{i=1}^{m} \alpha_iy_i = 0

考虑到原问题

minw,b12w2s.t.yi(wTx+b)1\underset{w,b}{min} \frac{1}{2}{\parallel w \parallel}^2 \\ s.t. y_i(w^Tx + b)\ge 1

得到到对偶问题

maxα i=1mαi\underset{\alpha}{max} \space \sum\limits_{i=1}^{m}\alpha_i 12i=1mj=1mαiαjyiyjxiTxj-\frac{1}{2} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{m} \alpha_i \alpha_j y_iy_j x_i^Tx_j s.ti=1mαiyi=0,α0,i=1,2,..,m\\ s.t \sum\limits_{i=1}^{m}\alpha_iy_i = 0 ,\alpha\ge0, i = 1,2,..,m

由偏导求得的结果得

f(x)=wTx+b=i=1mαiyixiTx+bf(x) = w^Tx + b = \sum\limits_{i=1}^{m} \alpha_i y_ix_i^Tx + b

KKTKKT 条件得

{ αi0 yif(xi)10 αi(yif(xi)1)=0\left\{\begin{matrix}\space \alpha_i \ge 0 \\ \space y_if(x_i) -1 \ge 0 \\ \space \alpha_i(y_if(x_i) - 1) = 0\end{matrix}\right.

实际上 如果 αi=0\alpha_i = 0 ,该样本不会有任何影响

如果 αi>0\alpha_i > 0 , 由上述第三个条件得到 yif(xi)=1y_i f(x_i) = 1

2. 算法流程

(实际上数学功底不好,写错也是常见的)

SMOSMO 算法的思路是固定αi\alpha_i之外的所有参数,求αi\alpha_i上的极值

选取一对需要更新的变量αi,αj\alpha_i, \alpha_j

固定αi,αj\alpha_i, \alpha_j之外的所有参数, 求解maxα i=1mαi12i=1mj=1mαiαjyiyjxiTxj\underset{\alpha}{max} \space \sum\limits_{i=1}^{m}\alpha_i -\frac{1}{2} \sum\limits_{i=1}^{m} \sum\limits_{j=1}^{m} \alpha_i \alpha_j y_iy_j x_i^Tx_j

s.ti=1mαiyi=0, α0,i=1,2,..,ms.t \sum\limits_{i=1}^{m}\alpha_iy_i = 0 ,\space \alpha \ge 0, i = 1,2,..,m

条件下更新的 αi,αj\alpha_i,\alpha_j

满足约束αiyi+αjyj=c\alpha_i y_i + \alpha_j y_j = c

不失一般性,我们选择 α1,α2\alpha_1 , \alpha_2 作为变量

L(α1,α2)=α1+α2+i=3mαiL(\alpha_1, \alpha_2) = \alpha_1 + \alpha_2 + \sum\limits_{i=3}^{m} \alpha_i

12[α1y1α1y1x1Tx1+2α1y1α2y2x1Tx2+2j=3mα1y1αjyj - \frac{1}{2}[\alpha_1y_1\alpha_1y_1x_1^Tx_1 + 2\alpha_1y_1\alpha_2y_2x_1^Tx_2 + 2\sum\limits_{j=3}^{m}\alpha_1y_1 \alpha_jy_j

+α2y2α2y2x2Tx2+2j=3mα2y2αjyjx2Txj+i=3mj=3mαiαjyiyjxiTxj] + \alpha_2y_2\alpha_2y_2x_2^Tx_2 + 2\sum\limits_{j=3}^{m}\alpha_2y_2\alpha_jy_jx_2^Tx_j + \sum\limits_{i=3}^{m} \sum\limits_{j=3}^{m} \alpha_i \alpha_j y_iy_j x_i^Tx_j]

满足约束α1y1+α2y2=c\alpha_1 y_1 + \alpha_2 y_2 = c

Kij=xiTxjK_{ij} = x_i^Tx_j

α1=y1(cα2y2)\alpha_1 = y_1(c - \alpha_2y_2) , (yi=1)(|y_i| = 1)

带入得

L(α2)=y1(cα2y2)+α2L(\alpha_2) = y_1(c - \alpha_2y_2) + \alpha_2

12[(cα2y2)2K11+2(cα2y2)α2y2K12+α22K22- \frac{1}{2}[(c - \alpha_2y_2)^2K_{11} + 2(c - \alpha_2y_2)\alpha_2y_2K_{12} + \alpha_2^2K_{22}

+2i=3m(cα2y2)αiyiK1i+2i=3mα2y2αiyiK2i]+ 2\sum\limits_{i=3}^{m}(c - \alpha_2y_2)\alpha_iy_iK_{1i} +2 \sum\limits_{i=3}^{m}\alpha_2y_2\alpha_iy_iK_{2i}]

求偏导得

Lα2=y1y2+1\frac{\partial L}{\partial \alpha_2} = -y_1y_2 + 1

12[2(α2y2c)y2K11+2cy2K124α2K12+2α2K22- \frac{1}{2}[2(\alpha_2y_2 - c)y_2K_{11} + 2cy_2K_{12}- 4\alpha_2K_{12} + 2\alpha_2K_{22}

2i=1my2αiyiK1i+2i=3my2αiyiK2i]- 2\sum\limits_{i=1}^{m}y_2\alpha_iy_iK_{1i} + 2\sum\limits_{i=3}^{m}y_2\alpha_iy_iK_{2i}]

Lα2=0\frac{\partial L}{\partial \alpha_2}= 0(实际上也就是L(α2)L(\alpha_2) 的值不变)

α2(K11+K222K12)\alpha_2(K_{11} + K_{22} - 2K_{12})

=1y1y2+cy2K11cy2K12+i=3my2αiyiK1ii=3my2αiyiK2i= 1- y_1y_2 + cy_2K_{11} - cy_2K_{12} + \sum\limits_{i=3}^{m}y_2\alpha_iy_iK_{1i} - \sum\limits_{i=3}^{m}y_2\alpha_iy_iK_{2i}

=y2(y2y1+cK11cK12+i=3mαiyiK1ii=13αiyiK2i)= y_2(y_2 - y_1 + cK_{11} -c K_{12} + \sum\limits_{i=3}^{m}\alpha_iy_iK_{1i} - \sum\limits_{i=1}^{3}\alpha_iy_iK_{2i})

又由于f(x1)=wTx1+b=i=1mαiyixiTx1+b=α1y1x1Tx1+α2y2x2Tx1+i=3mαiyixiTx1+bf(x_1) = w^Tx_1 + b = \sum\limits_{i=1}^{m}\alpha_iy_ix_i^Tx_1 + b = \alpha_1y_1x_1^Tx_1 + \alpha_2y_2x_2^Tx_1 + \sum\limits_{i=3}^{m}\alpha_iy_ix_i^Tx_1 + b

f(x1)=α1y1K11+α2y2K12+i=1mαiyiK1i+bf(x_1) = \alpha_1y_1K_{11} + \alpha_2y_2K_{12} + \sum\limits_{i=1}^{m}\alpha_iy_iK_{1i} + b

得到

i=3mαiyiK1i=f(x1)α1y1K11α2y2K12b\sum\limits_{i=3}^{m}\alpha_iy_iK_{1i} = f(x_1) - \alpha_1y_1K_{11}-\alpha_2y_2K_{12} - b

同理可推理出

i=3mαiyiK2i=f(x2)α1y1K12α2y2K22b\sum\limits_{i=3}^{m}\alpha_iy_iK_{2i} = f(x_2) - \alpha_1y_1K_{12 } - \alpha_2y_2K_{22} - b

带入 α2(K11+K222K12)\alpha_2(K_{11} + K_{22} - 2K_{12})

=y2(y2y1+cK11cK12+i=3mαiyiK1ii=3mαiyiK2i)= y_2(y_2 - y_1 + cK_{11} -c K_{12} + \sum\limits_{i=3}^{m}\alpha_iy_iK_{1i} - \sum\limits_{i=3}^{m}\alpha_iy_iK_{2i})

得到

α2(K11+K222K12)\alpha_2(K_{11} + K_{22} - 2K_{12})

=y2(f(x1)y1((f(x2)y2)+α2y2(K11+K222K12))= y_2(f(x_1) - y_1 - ((f(x_2) - y_2) + \alpha_2y_2(K_{11} + K_{22}- 2K_{12}))

ξ=K11+K222K12E1=f1(x)y1E2=f2(x)y2\xi = K_{11} + K_{22} - 2K_{12} \\ E_1 = f_1(x) - y_1 \\ E_2 = f_2(x) - y_2

α2ξ=E1E2+α2y2ξ\alpha_2\xi = E_1 - E_2 + \alpha_2y_2\xi

α2new=α2y2+E1E2ξ\alpha_2^{new} = \alpha_2y_2 + \frac{E_1 - E_2}{\xi}

α1new=α1old+y1y2(α2oldα2new)\alpha_1^{new} = \alpha_1^{old} + y_1y_2(\alpha_2^{old} - \alpha_2^{new})

得到这些公式之后,还要考虑 αi\alpha_i 的取值范围满足0αiC 0 \le\alpha_i\le CCC 为正则约束)

满足方形约束

y1y2y_1 \neq y_2 时 ,令α1α2=k\alpha_1 - \alpha_2 = k

max(0,k)α2min(C,Ck)max(0,-k) \le\alpha_2\le min(C,C - k)

y1=y2y_1 = y_2 的时候,令α1+α2=k\alpha_1 + \alpha_2 = k

max(0,kC)α2min(C,k)max(0,k - C)\le \alpha_2 \le min(C,k)

对于bb 的更新

0<α1new<c0 < \alpha_1^{new} < c 满足时y1(wTx1+b)=1y_1(w^Tx_1 + b) = 1

得到wTx1+b=y1w^Tx_1 + b = y_1

b1new=y1α1newy1K11α2newy2K12i=3mαiyiK1ib_1^{new} = y_1 - \alpha_1^{new}y_1K_{11}-\alpha_2^{new}y_2K_{12} - \sum\limits_{i=3}^{m}\alpha_iy_iK_{1i}

其中

y1i=3mαiyiK1i=E1+α1oldy1K11+α2oldy2K12+boldy_1 - \sum\limits_{i=3}^{m}\alpha_iy_iK_{1i} = -E_1 + \alpha_1^{old}y_1K_{11} + \alpha_2^{old}y_2K_{12} + b^{old}

b1new=E1+bold+y1K11(α1oldα1new)+y2K12(α2oldα2new)b_{1}^{new} = -E_1 + b^{old} + y_1K_{11}(\alpha_1^{old} - \alpha_1^{new}) + y_2K_{12}(\alpha_2^{old} - \alpha_2^{new})

b2new=E2+bold+y1K12(α1oldα1new)+y1K22(α2oldα2new)b_2^{new} = -E_2 + b_{old} + y_1K_{12}(\alpha_1^{old} - \alpha_1^{new}) + y_1K_{22}(\alpha_2^{old} - \alpha_2^{new})

bnew=b1new+b2new2b^{new} = \frac{b_1^{new} + b_2^{new}}{2}

如何选择需要的α1,α2\alpha_1, \alpha_2

第一个选择违背kkt条件程度最大的,第二个选择E1E2|E_1 - E_2| 最大的

3.算法实现

import numpy as np


class SVM:
    def __init__(self, kernel='linear', c=1.0, max_iter=1e9, eps=1e-9):
        self.kernel = str(kernel)
        self.max_iter = int(max_iter)
        self.eps = float(eps)
        self.C = float(c)
        self.B = 0.0

    def __init_args(self, features, labels):
        self.X = features
        self.Y = labels
        self.m, self.n = self.X.shape
        self.Alpha = np.zeros(self.m)

    def f(self, i):
        c = self.B
        for j in range(self.m):
            s = sum(self.X[j] * self.X[i])
            c += self.Alpha[j] * self.Y[j] * s
        return c

    def k(self, i, j):
        return sum(self.X[i] * self.X[j])

    def e(self, i):
        return self.f(i) - self.Y[i]

    def kkt(self, i):
        st1 = self.Alpha[i] >= 0
        st2 = self.Y[i] * self.f(i) >= 1
        st3 = self.Alpha[i]*(self.Y[i] * self.f(i) - 1) == 0
        st = st1 and st2 and st3
        return st

    def select_ij(self):
        working_set = [i for i in range(self.m) if not self.kkt(i)]
        e_cache = [self.e(i) for i in working_set]
        e_full = [self.e(i) for i in range(self.m)]

        if len(working_set) == 0:
            return -1, -1

        max_pair_product = -np.inf
        i1 = working_set[e_cache.index(max(e_cache))]
        i2 = -1

        for j in working_set:
            if j == i1:
                continue
            pair_product = abs(e_full[i1] - e_full[j])
            if pair_product > max_pair_product:
                max_pair_product = pair_product
                i2 = j
        if max_pair_product <= self.eps:
            return -1, -1
        return i1, i2

    @staticmethod
    def limits(alpha, low, high):
        if alpha < low:
            return low
        elif alpha > high:
            return high
        else:
            return alpha

    def fit(self, features, labels):
        self.__init_args(features, labels)
        for _ in range(self.max_iter):
            i, j = self.select_ij()
            if i == -1 and j == -1:
                break

            alpha_i_old = self.Alpha[i]
            alpha_j_old = self.Alpha[j]

            tau_ij = self.k(i, i) + self.k(j, j) - 2 * self.k(i, j)
            alpha_j_new = alpha_j_old * self.Y[j] + (self.e(i) - self.e(j)) / tau_ij

            if self.Y[i] != self.Y[j]:
                low = max(0.0, (alpha_j_old - alpha_i_old).item())
                high = min(self.C, self.C + (alpha_j_old - alpha_i_old).item())
            else:
                low = max(0.0, (alpha_i_old + alpha_j_old - self.C).item())
                high = min(self.C, (alpha_j_old + alpha_i_old).item())

            self.Alpha[j] = self.limits(alpha_j_new, low, high)
            self.Alpha[i] = alpha_i_old + self.Y[i] * self.Y[j] * (alpha_j_old - self.Alpha[j])

            bi = - self.e(i) + self.B + self.Y[i] * self.k(i, i) * (alpha_i_old - self.Alpha[i]) \
                + self.Y[j] * self.k(i, j) * (alpha_j_old - self.Alpha[j])
            bj = - self.e(j) + self.B + self.Y[i] * self.k(i, j) * (alpha_i_old - self.Alpha[i]) \
                + self.Y[j] * self.k(j, j) * (alpha_j_old - self.Alpha[j])

            if 0 < self.Alpha[i] < self.C:
                self.B = bi
            elif 0 < self.Alpha[j] < self.C:
                self.B = bi
            else:
                self.B = (bi + bj) / 2

    def get_coefficient(self):
        weights = np.zeros(self.n, dtype=float)
        for i in range(self.m):
            weights += self.Y[i] * self.Alpha[i] * self.X[i]
        bias = self.B
        return weights, bias


x = np.array([[1, 0, 1], [1, 1, 0], [0, 1, 1],
             [3, 0, 0], [0, 3, 0], [0, 0, 3]])
y = np.array([-1, -1, -1, 1, 1, 1])

svm = SVM(kernel='linear', c=1000, max_iter=1e4)
svm.fit(x, y)
w, b = svm.get_coefficient()

print(w)
print(b)

z = sum((w * x).T) + b
print(z)

抽空找个数据集跑跑