「这是我参与11月更文挑战的第12天，活动详情查看：2021最后一次更文挑战」

支持向量机概述

支持向量机（Support Vector Machine, SVM ）是一类按监督学习（ supervised learning）方式对数据进行二元分类的广义线性分类器（generalized linear classifier），其决策边界是对学习样本求解的最大边距超平面（maximum-margin hyperplane）。与逻辑回归和神经网络相比，支持向量机，在学习复杂的非线性方程时提供了一种更为清晰，更加强大的方式。

软间隔，硬间隔，和非线性SVM

假如数据是完全的线性可分的，那么学习到的模型可以称为硬间隔支持向量机。换个说法，硬间隔指的就是完全分类准确，不能存在分类错误的情况。软间隔，就是允许一定量的样本分类错误。

支持向量机算法思想

找到集合边缘上的若干数据（称为支持向量（Support Vector）），用这些点找出一个平面（称为决策面），使得支持向量到该平面的距离最大。

超平面背景知识：任意超平面可以用下面这个线性方程来描述： 𝑤𝑇𝑥 + 𝑏 = 0 二维空间点 (𝑥, 𝑦)到直线 𝐴𝑥 + 𝐵𝑦 + 𝐶 = 0的距离公式是： |𝐴𝑥 + 𝐵𝑦 + 𝐶|/(𝐴2+ 𝐵2)(2表示平方) ，扩展到 𝑛 维空间后，点 𝑥 = (𝑥1, 𝑥2 … 𝑥𝑛) 到超平面 𝑤𝑇𝑥 + 𝑏 = 0 的距离为：|𝑤𝑇𝑥+𝑏| / ||𝑤|| 其中 ||𝑤|| = $𝑤12 + ⋯ 𝑤𝑛2$ （结果开根号）

线性可分向量机

在超平面中，线性可分向量机公式如下：

图片.png 其中𝛼为拉格朗日乘子

𝑤 =E[m,i=1]𝛼𝑖𝑦𝑖𝑥𝑖

E[m,i=1]𝛼𝑖*𝑦𝑖 =0

线性可分向量机实战

将数据使用线性可分算法后，将其可视化。使用sklearn库中的svm算法实现。数据集如下：

#coding=gbk
import numpy as np
import pylab as pl
from sklearn import svm

def loadDataSet(fileName):
    dataMat = []; labelMat = []
    fr = open(fileName)
    for line in fr.readlines():                                     #逐行读取，滤除空格等
        lineArr = line.strip().split('\t')
        dataMat.append([float(lineArr[0]), float(lineArr[1])])      #添加数据
        labelMat.append(float(lineArr[2]))                          #添加标签
    return dataMat,labelMat
X,Y = loadDataSet('datasets_testSet.txt')

#fit the model
clf = svm.SVC(kernel='linear')
clf.fit(X, Y)

# get the separating hyperplane
w = clf.coef_[0]
a = -w[0]/w[1]
xx = np.linspace(-5, 5)
yy = a*xx - (clf.intercept_[0])/w[1]

# plot the parallels to the separating hyperplane that pass through the support vectors
b = clf.support_vectors_[0]
yy_down = a*xx + (b[1] - a*b[0])
b = clf.support_vectors_[-1]
yy_up = a*xx + (b[1] - a*b[0])

print ("w: ", w)
print ("a: ", a)

# print "xx: ", xx
# print "yy: ", yy
print ("support_vectors_: ", clf.support_vectors_)
print ("clf.coef_: ", clf.coef_)

# switching to the generic n-dimensional parameterization of the hyperplan to the 2D-specific equation
# of a line y=a.x +b: the generic w_0x + w_1y +w_3=0 can be rewritten y = -(w_0/w_1) x + (w_3/w_1)


# plot the line, the points, and the nearest vectors to the plane
pl.plot(xx, yy, 'k-')
pl.plot(xx, yy_down, 'k--')
pl.plot(xx, yy_up, 'k--')

pl.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
          s=80, facecolors='none')
pl.scatter([x[0] for x in X], [x[1] for x in X], c=Y, cmap=pl.cm.Paired)

pl.axis('tight')
pl.show()

结果如下：

机器学习——支持向量机（机器学习实战）

支持向量机概述

支持向量机算法思想

线性可分向量机

线性可分向量机实战