支持向量机 代码解析与SMO算法

900 阅读6分钟

1. 数据准备

使用的是sklearn包中的iris数据集。

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

def create_data():
    '''
    准备数据
    :return: 特征值和标签值
    '''
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    print(df.shape)
    df['label'] = iris.target
    df.columns = [
        'sepal length', 'sepal width', 'petal length', 'petal width', 'label'
    ]
    # 取前100条数据,2类。
    data = np.array(df.iloc[:100, [0, 1, -1]])
    for i in range(len(data)):
        if data[i, -1] == 0:
            data[i, -1] = -1
    # print(data)
    return data[:, :2], data[:, -1]


2. 创建SVM类

2.1 SVM类中包含以下方法:

class SVM(object):
    def __init__(self, iterations, margin=0, penalty=1, kernal='linear'):
        '''
        类初始化
        :param iterations: 迭代次数
        :param margin: 松弛变量
        :param penalty: 惩罚参数,可用于限制alpha的范围
        :param kernal: 核方法,默认使用线性,可选的有多项式
        '''
        self.kernal = kernal
        self.iterations = iterations
        self.margin = margin
        self.penalty = penalty

    def __initArgs__(self, _X, _Y):
        '''
        根据数据集初始化alpha等重要参数
        :param _X: 训练集中的特征值
        :param _Y: 训练集中的标签值
        :return:
        '''
        self.trainX = _X
        self.trainY = _Y
        print('train_X shape:', self.trainX.shape)
        print('train_Y shape:', self.trainY.shape)
        # 表示数据量大小
        self.data_size = _X.shape[0]
        # 表示特征的数据量
        self.feature_nums = _X.shape[1]
        self.alpha = np.ones(_Y.shape)
        self.b = np.zeros((1, 1))
        # 预测值与真是输出之差
        self._E = [self._getE(_) for _ in range(self.data_size)]

    def fit(self, x, y):
        '''
        通过x, y拟合模型
        :param x: 训练集中的特征值
        :param y: 训练集中的标签值
        :return:
        '''
        self.__initArgs__(x, y)
        self.SMO()
        print('train done.')

    def kernalFunction(self, x1, x2):
        '''
        核函数计算结果:
        linear:线性函数,x1与x2的内积
        :param x1: 特征值1
        :param x2: 特征值2
        :param parameter: 核函数种类:线性和多项式
        :return: 映射结果
        '''
        pass

    def _g(self, xi):
        '''
        计算 wx+b 的结果
        :param xi: 1*d 维的特征值
        '''
        pass

    def checkKKT(self, index_no):
        '''
        判断第index_no行数据(即alpha[index_no])是够符合KKT条件:
        :param index_no: 下标
        :return:
        '''
        pass

    def _getE(self, index_no):
        '''
        计算误差E
        :param index_no: 目标数据的编号
        :return:
        '''
        pass

    def accurate(self, X, Y):
        '''
        计算准确度
        :param X: 维度是(M*d)的特征值
        :return:
        '''
        right_num = 0
        for i in range(X.shape[0]):
            if Y[i] == self.predict(X[i]):
                right_num += 1
        return right_num / X.shape[0]

    def predict(self, x):
        '''
        预测
        :param x: 1*d 维的特征值
        :return: 预测结果
        '''
        return 1 if self._g(x) > 0 else -1

    def init_alpha(self):
        '''
        选择alpha1和alpha2变量,其中,alpha1是违背KKT条件的变量,
        alpha2是与alpha1对应偏差最大的下标
        :return: alpha1和alpha2的下标
        '''
        pass

    def SMO(self):
        '''
        SMO 算法
        :return: 
        '''
        pass

2.2 几个比较重要的方法

2.2.1 核函数

实现了两种核函数的方法,默认使用linear

    def kernalFunction(self, x1, x2):
        '''
        核函数计算结果:
        linear:线性函数,x1与x2的内积
        :param x1: 特征值1
        :param x2: 特征值2
        :param parameter: 核函数种类:线性和多项式
        :return: 映射结果
        '''
        if self.kernal == 'linear':
            return np.dot(x1, x2.T)
        elif self.kernal == 'poly':
            return np.dot(x1, x2.T)**2
        return 0
2.2.2 g(\boldsymbol{x})=\sum_{i=1}^{N}\alpha_iy_iK(\boldsymbol{x_i, x})+b

当核函数为线性核时,即可理解成w\boldsymbol{x}+b

    def _g(self, xi):
        '''
        计算 wx+b 的结果
        :param xi: 1*d 维的特征值
        '''
        # w维度为(N*1)
        w = np.multiply(self.alpha, self.trainY)
        # kernal_result维度为(1*N)
        kernal_result = self.kernalFunction(xi, self.trainX)
        res = np.dot(kernal_result, w) + self.b
        return res
2.2.4 计算输出值与真实值的误差
    def _getE(self, index_no):
        '''
        计算误差E
        :param index_no: 目标数据的编号
        :return:
        '''
        xi = self.trainX[index_no, :]
        return self._g(xi) - self.trainY[index_no]

3. SMO 算法

3.1 代码中的SMO:

实际应用中的SMO并没有《支持向量机-理论推导》中的那么简单,至少还要考虑以下几个问题:

  • 问题1:什么时候结束?
  • 问题2:怎么选取变量\alpha_{u_1}\alpha_{u_2}?
  • 问题3:在代码中,公式(3.5)怎么求解?
\begin{align}
\min_{\alpha} L(w,b,\alpha) &=\frac{1}{2} \sum_{i=1}^{N}\sum_{j=1}^{N}\alpha_i \alpha_j y_i y_j \boldsymbol{x_i^T}\boldsymbol{x_j} - \sum_{i=1}^{N}\alpha_i
\tag{3.5} \\
s.t. &\sum_{i=1}^{N} \alpha_i y_i=0, \\
& C-\alpha_i-\mu_i=0, \\ 
&\alpha_i \geq 0, \\
 &\mu_i \geq 0, i=1,2,...,N
\end{align}
3.1.1 什么时候结束?

当所有参数都满足KKT条件的时候,此时的参数即为最优化参数的解。

回顾一下整个问题的求解过程,满足KKT条件是该最优化问题的基础,也就是其充分必要条件。SMO算法流程其实是将整个问题的最优化解分成了关于两个变量的最优化解的问题,每次关于两个变量的解都使得整体的解更接近最优解。

支持向量机-理论推导中,KKT条件为:

KKT条件

    def checkKKT(self, index_no):
        '''
        判断第index_no行数据(即alpha[index_no])是够符合KKT条件:
        :param index_no: 下标
        :return:
        '''

        xi = self.trainX[index_no, :]
        gx = self._g(xi)
        if self.alpha[index_no] == 0:
            return self.trainY[index_no] * gx > 1 - self.margin
        elif self.penalty > self.alpha[index_no] > 0:
            return self.trainY[index_no] * gx == 1 - self.margin
        else:
            return self.trainY[index_no] * gx < 1 - self.margin
3.1.2 怎么求解变量\alpha_{u_1}\alpha_{u_2}?
3.1.2.1 \alpha_{u_1}\alpha_{u_2}范围确定

考虑限制条件\sum_{i=1}^{N}a_iy_i=0,由于剩下的参数被固定住了,故而认定为常数k,因此说是要考虑两个变量,其实只要考虑一个就好(记为a_{u_2}),因为另一个可以用第一个变量表达出来(记为a_{u_1})。

\begin{align}
a_{u_1}y_{u_2}+a_{u_1}a_{u_2} =- \sum_{i=1;i \neq u_1, u_2}^{N} \alpha_i y_i = k \tag{3.1} \\
\end{align}

y_{u_1}y_{u_2}属于同一类别时:有a_{u_1}+a_{u_2}=k,考虑限制条件0 \leq a_i \leq C,所以有:a_{u_2}=k-a_{u_1},即更新后的a_{u_2}的范围为:

\left\{\begin{matrix}
0\leq k-a_{u_2}\leq C\\
0\leq a_{u_2}\leq C\\
\end{matrix}\right.
\Leftrightarrow
\left\{\begin{matrix}
k\geq a_{u_2}\geq k-C\\
C\geq a_{u_2}\geq 0\\
\end{matrix}\right.\\
\Leftrightarrow
H=min(k, C)\geq a_{u_2}\geq max(k-C,0)=L\tag{3.2}

y_{u_1}y_{u_2}不属于同一类别时:有a_{u_1}-a_{u_2}=k,考虑限制条件0 \leq a_i \leq C,所以有:a_{u_2}=a_{u_1}-k,即更新后的a_{u_2}的范围为:

\left\{\begin{matrix}
0\leq a_{u_2}-k\leq C\\
0\leq a_{u_2}\leq C\\
\end{matrix}\right.
\Leftrightarrow 
\left\{\begin{matrix}
C+k\geq a_{u_2}\geq k\\
C\geq a_{u_2}\geq 0\\
\end{matrix}\right.\\
\Leftrightarrow 
H=min(C-k,C)\geq a_{u_2}\geq max(k,0)=L\tag{3.3}
3.1.2.2 \alpha_{u_1}\alpha_{u_2}大小计算

记固定除变量\alpha_{u_1}\alpha_{u_2}之外其他变量后,目标函数为

\begin{align}
w(a_{u_1},a_{u_2}) 
&=\frac{1}{2}\sum_{j=1}^{N}\alpha_{u_1}\alpha_j y_{u_1} y_j K(\boldsymbol{x_{u_1}}, \boldsymbol{x_j})
+\frac{1}{2}\sum_{j=1}^{N}\alpha_{u_2}\alpha_j y_{u_2} y_j K(\boldsymbol{x_{u_2}}, \boldsymbol{x_j})
+\sum_{i=1;i \neq u_1, u_2}^{N}\sum_{j=1}^{N}\alpha_i \alpha_j y_i y_j K(\boldsymbol{x_i}, \boldsymbol{x_j}) - \sum_{i=1}^{N}\alpha_i \\
&=\frac{1}{2}\alpha_{u_1}^2 K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}}) 
+\frac{1}{2}\alpha_{u_2}^2 K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_1}}) +\alpha_{u_1}\alpha_{u_2}y_{u_1}y_{u_2}K(x_{u_1}, x_{u_2})\\
&+\alpha_{u_1}y_{u_1}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
+\alpha_{u_2}y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})\\
&+\frac{1}{2}\sum_{i=1;i \neq u_1, u_2}^{N}\sum_{j=1;j \neq u_1, u_2}^{N}\alpha_i \alpha_j y_i y_j K(\boldsymbol{x_i}, \boldsymbol{x_j})
-(a_{u_1}+a_{u_2}) 
-\sum_{i=1;i \neq u_1, u_2}^{N}\alpha_i
\tag{3.4} \\
\end{align}

由于

\frac{1}{2}\sum_{i=1;i \neq u_1, u_2}^{N}\sum_{j=1;j \neq u_1, u_2}^{N}\alpha_i \alpha_j y_i y_j K(\boldsymbol{x_i}, \boldsymbol{x_j})
-\sum_{i=1;i \neq u_1, u_2}^{N}\alpha_i

为常数项对于求其极值无影响,因此,记

\begin{align}
w(a_{u_1},a_{u_2}) &=\frac{1}{2}\alpha_{u_1}^2 K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}})
+\frac{1}{2}\alpha_{u_2}^2 K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_2}}) -(a_{u_1}+a_{u_2})  \\
&+\alpha_{u_1} \alpha_{u_2}y_{u_1}y_{u_2}K(x_{u_1}, x_{u_2})\\
&+\alpha_{u_1}y_{u_1}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
+\alpha_{u_2}y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})
\tag{3.5} \\
s.t.&-\sum_{i=1,i \neq u_1,u_2}^{N} \alpha_i y_i=\alpha_{u_1}y_{u_1}+\alpha_{u_2}y_{u_2},\\
& C \geq \alpha_u \geq 0,   u=u_1,u_2
\end{align}

又因

\begin{align}
-\sum_{i=1,i\neq u_1,u_2}^{N} \alpha_i y_i&=\alpha_{u_1}y_{u_1}+\alpha_{u_2}y_{u_2}\\
\alpha_{u_1}&=y_{u_1}(k -\alpha_{u_2}y_{u_2}) 
\tag{3.6}
\end{align}

将公式(3.3)代入公式(3.2)得

\begin{align}
w(a_{u_2}) &=\frac{1}{2}(y_{u_1}(k-\alpha_{u_2}y_{u_2}))^2 K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}})
+\frac{1}{2}\alpha_{u_2}^2 K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_2}}) -(y_{u_1}(k -\alpha_{u_2}y_{u_2})+a_{u_2})  \\
&+y_{u_1}(k -\alpha_{u_2}y_{u_2})\alpha_{u_2}y_{u_1}y_{u_2} K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_2}})\\
&+y_{u_1}(k -\alpha_{u_2}y_{u_2})y_{u_1}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
+\alpha_{u_2}y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})
\tag{3.7} \\
\end{align}

在求w(a_{u_2})a_{u_2}的导数(注意y_i^2=1),则有

\begin{align}
\frac{\partial{w(a_{u_2})}}{\partial{a_{u_2}}} &=\frac{1}{2}(-2ky_{u_2}+2\alpha_{u_2})K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}})
+\alpha_{u_2} K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_2}}) +(y_{u_1}y_{u_2}-1)  \\
&+y_{u_1}(k -2\alpha_{u_2}y_{u_2})y_{u_1}y_{u_2}K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_2}}) \\
&+y_{u_1}( -y_{u_2})y_{u_1}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
+y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})\\\\
&= (-ky_{u_2}+\alpha_{u_2})K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}})
+\alpha_{u_2} K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_2}}) +(y_{u_1}y_{u_2}-1)  \\
&+(ky_{u_2} -2\alpha_{u_2})K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_2}})\\
&-y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
+y_{u_2}\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})\\
\tag{3.8}\\
\end{align}

令真实值与预测值的偏差E_{u_p}为:

\begin{align}
E_i=\sum_{i=1}^{N}\alpha_iy_i K(\boldsymbol{x_i}, \boldsymbol{x_{u_p}})+b-y_i, p=1,2
\tag{3.9}
\end{align}

公式(3.8)中:

\begin{align}
\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i})
=E_i-b+y_i- \alpha_i  y_i K(\boldsymbol{x_{u_1}}, \boldsymbol{x_i}) \tag{3.10}\\
\sum_{i=1;i \neq u_1,u_2}^{N} \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i})
=E_i-b+y_i- \alpha_i  y_i K(\boldsymbol{x_{u_2}}, \boldsymbol{x_i}) \tag{3.11}\\
k = a_{u_1}y_{u_2}+a_{u_1}a_{u_2}\tag{3.12} \\
\end{align}

所以,令公式(3.8)为0,则有:

\begin{align}
\alpha_{u_2}^{*}=\alpha_{u_2}^{(old)} + \frac{y_{u_2}(E_{u_1}-E_{u_2})}{K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_1}})+K(\boldsymbol{x_{u_2}}, \boldsymbol{x_{u_2}})-2K(\boldsymbol{x_{u_1}}, \boldsymbol{x_{u_2}})}
\tag{3.13}\\
\end{align}

考虑取值范围后(公式3.2与3.3):

\alpha_{u_2}^{(new)}=\left\{\begin{matrix}
L,& L>\alpha_{u_2}^{*}\\ 
\alpha_{u_2}^{*},& L<\alpha_{u_2}^{*} < H\\ 
H,& H<\alpha_{u_2}^{*}
\end{matrix}\right.

由公式(3.6)可知

\begin{align}
\alpha_{u_1}^{(new)}=\alpha_{u_1}^{(old)} + y_{u_1}y_{u_2}(\alpha_{u_2}^{(old)}-\alpha_{u_2}^{(new)})\tag{3.14}\\
\end{align}
3.1.3 怎么选取变量\alpha_{u_1}\alpha_{u_2}?

一个是违反KKT条件最严重的那一个,另一个由约束条件自动确定。如此,SMO算法将原问题不断分解为子问题并对子问题求解,进而达到求解原问题的目的。

  • 第一个变量的选择(外层循环) 所以,第一个\alpha优先选择可能为支持向量的,若都满足KKT条件,那就遍历整个训练集上的点。
  • 第二个变量的选择(内层循环) 第2个变量选择的标准是希望能使\alpha_{u_2}有足够大的变化。使其对应的|E_1-E_2|最大。因为\alpha_{u_1}已定,E_1也确定。
    • 如果E_1是正的,那么选择最小的E作为E_2
    • 如果E是负的,那么选择最大的E作为E_2

为了节省计算时间,将所有E,值保存在一个列表中。

    def init_alpha(self):
        '''
        选择alpha1和alpha2变量,其中,alpha1是违背KKT条件的变量,
        alpha2是与alpha1对应偏差最大的下标
        :return: alpha1和alpha2的下标
        '''
        # 优先检查可能是支持向量的alpha
        satisfy_index = []
        unsatisfy_index = []
        for i in range(self.data_size):
            if 0 < self.alpha[i] < self.penalty:
                satisfy_index.append(i)
            else:
                unsatisfy_index.append(i)
        index = satisfy_index + unsatisfy_index

        for index_no in index:
            if self.checkKKT(index_no):
                continue
            else:
                E1 = self._E[index_no]
                if E1 >= 0:
                    j = min(range(self.data_size), key=lambda x: self._E[x])
                else:
                    j = max(range(self.data_size), key=lambda x: self._E[x])

            return index_no, j
3.1.4 b与E的计算

image.png

    def SMO(self):
        '''
        SMO 算法
        :return:
        '''
        for iteration in range(self.iterations):
            print('第%d代' % iteration)
            index_1, index_2 = self.init_alpha()
            if self.trainY[index_1] == self.trainY[index_2]:
                L = max(0, self.alpha[index_1]+self.alpha[index_2]-self.penalty)
                H = min(self.penalty, self.alpha[index_1]+self.alpha[index_2])
            else:
                L = max(0, self.alpha[index_1] - self.alpha[index_2])
                H = min(self.penalty, self.penalty - self.alpha[index_1] + self.alpha[index_2])

            E1 = self._E[index_1]
            E2 = self._E[index_2]
            # eta=K11+K22-2K12
            eta = self.kernalFunction(
                self.trainX[index_1, :], self.trainX[index_1, :]
            ) + self.kernalFunction(
                self.trainX[index_2, :], self.trainX[index_2, :]
            ) - 2 * self.kernalFunction(
                self.trainX[index_1, :], self.trainX[index_2, :]
            )
            if eta <= 0:
                # print('eta <= 0')
                continue

            alpha2_new_unc = self.alpha[index_2] + self.trainY[index_2] * (E1 - E2) / eta

            if alpha2_new_unc > H:
                alpha2_new = H
            elif alpha2_new_unc < L:
                alpha2_new = L
            else:
                alpha2_new = alpha2_new_unc

            alpha1_new = self.alpha[index_1] + self.trainY[index_1] * self.trainY[index_2] * (
                    self.alpha[index_2] - alpha2_new
            )

            b1_new = -E1 - self.trainY[index_1] * self.kernalFunction(
                self.trainX[index_1], self.trainX[index_1]
            ) * (
                    alpha1_new - self.alpha[index_1]
            ) - self.trainY[index_2] * self.kernalFunction(
                self.trainX[index_2], self.trainX[index_1]
            ) * (alpha2_new - self.alpha[index_2]) + self.b

            b2_new = -E2 - self.trainY[index_1] * self.kernalFunction(
                self.trainX[index_1], self.trainX[index_2]
            ) * (
                    alpha1_new - self.alpha[index_1]
            ) - self.trainY[index_2] * self.kernalFunction(
                self.trainX[index_2], self.trainX[index_2]
            ) * (alpha2_new - self.alpha[index_2]) + self.b

            if 0 < alpha1_new < self.penalty:
                b_new = b1_new
            elif 0 < alpha2_new < self.penalty:
                b_new = b2_new
            else:
                # 选择中点
                b_new = (b1_new + b2_new) / 2

            # 更新参数
            self.alpha[index_1] = alpha1_new
            self.alpha[index_2] = alpha2_new
            self.b = b_new

            self._E[index_1] = self._getE(index_1)
            self._E[index_2] = self._getE(index_2)

4. 完整代码

完整代码


5. 小结

与sklearn官方库相比,还存在以下问题:

训练结果不稳定,sklearn在iris训练集中准确率基本达到0.97以上,自行实践的0.52~0.92之间,可采用交叉验证等方法确定参数。

6. 参加文献

《西瓜书》 《统计学习方法》