携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第天,点击查看活动详情 >>
The prcface
For improve my English level and prepare for the postgraduate entrance examination(研究生入学考试) i will wriiten blog by English. So although I have written a lot of content in Chinese, I still have to translate it into English by myself and this is why my blog updates will be slow . Of course, if I have time, I will launch the corresponding Chinese version.
okey let's today's blog. Welcome to my channel!
Before you read this blog i hope you have already to know something about machine learning it is not fit someone who hasn't learned that.
target:
- The mathematrical principle of SVM algorithm
- how coding that
- Real case
no coding no future that's go gays
Compare
If we said SVM algorithm we must said others algorithms in machine learning such as decision tree and Logical regression,because of thare all could help us to class object. But why we should use SVM,what different?
Logical Regression
this algorithm is very easy to learn you can get a probability for classification.
if we have this type object (x1,x2,label) in our decision-making space, you are likely to see this:
You will find that the effect is not good. Because no matter what you do, the decision boundary obtained by the logical regression method is always linear, and you can't get the ring boundary you need here. Therefore, logical regression is suitable for dealing with classification problems that are nearly linearly separable.
Decision tree
if we use this algorithm you can see this:
and then you will got this:
If you continue to increase the size of the tree, you will notice that the decision boundary is constantly surrounded by parallel lines. Therefore, if the boundary is nonlinear and can be simulated by constantly dividing the feature space into rectangles, then the decision tree is a better choice than logical regression.
SVM
Although our data is two-dimensional, we can use kernel functions to map it to high-dimensional space for classification.
so maybe you will see this:
and then you can get this:
Note: the decision boundary is not such a standard circle, but it is very close (probably polygons). In order to make it easy to operate, we used rings instead.
So now we can simply analyze which algorithm we can use in different scenarios.
Analysis
Logistic Regression Analysis
A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.
In addition, the logical regression algorithm is robust to small and medium noise and will not be specially affected by slight multicollinearity. Serious multicollinearity can be solved by logical regression combined with L2 regularization, but if you want to get a reduced model, L2 regularization is not the best choice, because the model it establishes covers all the features.
When you have a large number of features and lose most of your data, logical regression will be inadequate. At the same time, too many category variables are also a problem for logical regression. Logical regression is the probability of using the whole data to get it. So when you try to draw a separation curve, logical regression may assume that the "obvious" data points at both ends of the score should not be paid attention to. Ideally, logical regression should rely on these boundary points. At the same time, if some features are nonlinear, then you have to rely on transformation, and this becomes another problem when the dimension of your feature space increases.
Advantages
Convenient probability score of observation sample.
Efficient implementation of existing tools.
For logical regression, multicollinearity is not a problem, it can be solved with L2 regularization.
Logical regression is widely used in industrial problems (this is very important).
Disadvantages
When the feature space is very large, the performance of logical regression is not very good.
Can not handle a large number of multi-class features or variables well.
For nonlinear features, conversion is needed.
Depends on all the data (over-fitting is serious).
Decision Tree Analysis
The inherent characteristic of the decision tree is that it does not care about unidirectional transformations or nonlinear features (which is different from the nonlinear correlation in predictors), because they simply insert rectangles in the feature space, and these shapes can adapt to any monotone transformation. When the decision tree is designed to deal with discrete data or categories of predictors, any number of classification variables is not a real problem for the decision tree. The model trained using the decision tree is quite intuitive and easy to explain in business. The decision tree does not take the probability score as the direct result, but you can use the class probability to assign it to the end node in turn. This brings us to the biggest problem associated with decision trees, that is, they are highly biased models. You can build a decision tree model on the training set, and its results on the training set may be better than other algorithms, but your test set will eventually prove to be a poor predictor. You must prune the tree and combine cross-validation to get a decision tree model without over-fitting.
Random forest overcomes the defect of over-fitting to a great extent, and there is nothing special about it, but it is a very excellent extension of the decision tree. Random forests also deprive business rules of ease of interpretation, because now you have thousands of such trees, and most of the voting rules they use make the model more complex. At the same time, there are interactions between decision tree variables, which can make the results very inefficient if most of your variables do not interact with each other or are very weak. In addition, this design also makes them less susceptible to multicollinearity.
Advantages
Intuitive decision rules.
Can deal with non-linear features.
The interaction between variables is considered.
Disadvantages
It is easy to fit, of course, it can be solved by random deep forest.
SVM Analysis
The characteristic of support vector machine is that it relies on boundary samples to establish the desired separation curve. As we have seen between us, it can deal with nonlinear decision boundaries. Their dependence on boundaries also gives them the ability to deal with "obvious" sample instances in missing data. Support vector machine can deal with large feature space, so it has become one of the most popular algorithms in text analysis. Because text data almost always produce a large number of features, logical regression is not a very good choice in this case.
The results of SVM are not as intuitive as decision trees. At the same time, the nonlinear kernel is used, which makes the training of support vector machine on large data very time-consuming.
Advantages
Ability to handle large feature spaces.
Ability to deal with the interaction between nonlinear features.
No need to rely on the entire data
Disadvantages
When there are many observation samples, the efficiency is not very high.
Sometimes it is difficult to find a suitable kernel function
Adivces
-
Logical regression is the first choice to bear the brunt. If its effect is not good, then its results can be used as a benchmark for reference.
-
Then try whether the decision tree (random forest) can greatly improve the performance of the model. Even if you don't use it as the final model, you can use random forests to remove noise variables.
-
If the number of features and observation samples are particularly large, then when the resources and time are sufficient, the use of SVM is a choice.
Don't find it troublesome, in fact, in the current situation, you just need to use sklearn to call different API.
Mathematrical principle of SVM
Target Function
okey is time to say SVM how it work what mathematrical about it?
Our target is sample that we just need to class objects in tow categorys.
just like this:
(If we don't use kernel functions to calculate)
We need to find a straight line or hyperplane to make the nearest point on both sides farthest from the plane, so as to achieve a good classification effect.
just like this:
We can assume that this hyperplane looks like this.
like this:
So, by deriving the distance from the point to the straight line, we can actually derive the distance from the point to the hyperplane.
(For example, the distance from a point (x0) to a straight line Ax+By+C=0:)
The distance from the point to the hyperplane:
However, since SVM is a binary algorithm, we can stipulate like this:
it is means you input data will like it:
(x1,x2,1),(x2_1,x2_2,0)...
the number 1 and 0 is your label
so,the distance will be this:
Find a hyperplane (w and b) so that the point closest to the plane can be farthest.
Argmax makes min (the distance from the nearest point to the plane)
tips:
For better calculation, we make the following assumption, since scaling does not affect the distance from the point to the plane, we assume:
So, we can reduce the distance formula to:
For better calculation, we change the maximum value to the minimum value.
And for the convenience of derivation, we add square.
so we need do it:
Lagrange duality
To solve this constrained problem, we need to use this
and then we get this:
but if we use this the function will be:
So We transform the dual problem.
so we can find the partial derivative.
Simplify
Last we can get this:
and then we make it
extend this:
(To make the equation clearer, we changed it a little bit.)
To:
with this Conditions
Similarly, in order to facilitate the operation, we convert the maximum value to the minimum value.
We just need to add a minus sign.
Finally, we got all the values we needed.
Take these values into the formula of the partial derivative obtained previously, and get w and b.
example
this example from this video: www.bilibili.com/video/BV15A…
and methematrical princple from there: zhuanlan.zhihu.com/p/270298485
Soft interval
If we strictly follow the maximum and minimum interval we have calculated, this kind of situation is easy to occur.
so we make
to
and then target function become this:
Last the result be:
Kernel function
Through the above derivation and examples, you will find that what has been done above is only linear processing, and what we end up with is not a nonlinear boundary, so we may need more information to convert the distance between the point and the line into the distance of the face. and this needs to be dealt with by a kernel function.
such as this:
It is very difficult for us to divide into two categories through a line segment.
But if the points above can look like this:
If this is the case, we can directly use a face to split into two classes from the middle。
for it we can use this:
def kernelTrans(X, A, kTup):
m,n = shape(X)
K = mat(zeros((m,1)))
if kTup[0]=='lin': #linner nothing to do here
K = X * A.T
elif kTup[0]=='gaosi': # (gaosi function)
for j in range(m):
deltaRow = X[j,:] - A
K[j] = deltaRow*deltaRow.T
K = exp(K/(-1*kTup[1]**2))
else:
raise NameError("Son of a bitch doesn't have this kernel function.")
return K
Here we can still quote the case in the video above (about kernel functions, of course we have more direct examples in the actual code, and we will make some additional additions, but the general process is similar)
After that, the calculation is the same as before, but one more mapping is done.
I don't want to say too much here, because it's very simple, we'll show it in the code. In fact, I've been ready to do this a long time ago, but I don't know how to talk about it. I also have to thank Dr. Tang Yudi, the author of the above-mentioned video. In fact, until then, I had been abstracting the mathematical principles before SVM, just knowing how to use the formula we finally got.
Coding
Data type
okey,The first imporant thing is what the data type,because programe equal data type and your algorithm.Today we data type will like yestoday when we coding decision tree algorithm but i will save these data in txt file.
class DataSet(object):
def __init__(self,path):
self.Features=[]
self.Labels = []
self.path = path
def LoadDataSet(self):
if(os.path.exists(self.path)):
with open(self.path) as file:
for line in file.readlines():
lineArr = [float(x) for x in line.strip().split()]
self.Features.append([lineArr[0], lineArr[1]])
self.Labels.append(lineArr[2])
return self.Features, self.Labels
else:
raise Exception("Fuck you no such file")
Core
when we start coding we must to know what is our core or target.
Now we have to know what should we do,wo have a target function,when we input our data we will get a function like this:
we need to know aplpha about our input data,and then we can know our function will be like:
But if we want to get this,we need to do something make machine can do this:
That is, how to let the machine help us solve the equation?
So here, our core is actually how to use a method to make the machine complete the operation of automatically solving equations.
SMO algorithm is mentioned in the book 《Statistical Learning Methods》.Of course, we can also use other similar heuristic algorithms.
def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:
for i in range(self.m): # 遍历所有数据
alphaPairsChanged += self.KKTGoing(i)
print("fullSet, iter: %d i:%d, pairs changed %d" % (
iter, i, alphaPairsChanged))
iter += 1
else:
nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += self.KKTGoing(i)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return self.b, self.alphas
forcast
when we get the alaphas and b we can get our hyperplane, and then substitute new samples to calculate the distance between these samples and hyperplane, and then complete the classification by sign function.
def forcast(self,dataSet:DataSet,show=False):
res = []
dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
datMat_forcast = mat(dataArr_forcast)
m, n = shape(datMat_forcast)
for i in range(m): # 在测试数据上检验错误率
kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
res.append(predict)
if(show):
print("the result is:",res)
return res
All code
now that's we show all code:
from numpy import *
import os
class DataSet(object):
def __init__(self,path):
self.Features=[]
self.Labels = []
self.path = path
def LoadDataSet(self):
if(os.path.exists(self.path)):
with open(self.path) as file:
for line in file.readlines():
lineArr = [float(x) for x in line.strip().split()]
self.Features.append([lineArr[0], lineArr[1]])
self.Labels.append(lineArr[2])
return self.Features, self.Labels
else:
raise Exception("Fuck you no such file")
class SVMModel(object):
def __init__(self,Ktype):
self.Ktype = Ktype
def __SMO_init(self,Features, Labels, C, toler, Ktype):
"""
:param Features:
:param Labels:
:param C: Soft interval
:param toler: Stop threshold
:param Ktype:
"""
self.X = Features
self.labelMat = Labels
self.C = C
self.tol = toler
self.m = shape(Features)[0]
self.alphas = mat(zeros((self.m,1)))
self.b = 0
self.eCache = mat(zeros(shape(Features)))
self.K = mat(zeros((self.m,self.m)))
self.sVs = None
self.labelSV = None
self.svInd = None
for i in range(self.m):
self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)
def kernelFunction(self,X, A, Ktype):
"""
:param X:
:param A:
:param Ktype: (type,param)
:return:
"""
m, n = shape(X)
K = mat(zeros((m, 1)))
if Ktype[0] == 'lin':
K = X * A.T
elif Ktype[0] == 'rbf':
for j in range(m):
deltaRow = X[j, :] - A
K[j] = deltaRow * deltaRow.T
K = exp(K / (-1 * Ktype[1] ** 2))
else:
raise NameError('Houston We Have a Problem -- That Kernel is not recognized')
return K
def __SelectRand(self,i, m):
j = i
while (j == i):
j = int(random.uniform(0, m))
return j
def __SelectAj(self,i, oS, Ei):
maxK = -1
maxDeltaE = 0
Ej = 0
oS.eCache[i] = [1, Ei]
validEcacheList = nonzero(self.eCache[:, 0].A)[0] # 返回矩阵中的非零位置的行数
if (len(validEcacheList)) > 1:
for k in validEcacheList:
if k == i:
continue
Ek = self.__calcEk(k)
deltaE = abs(Ei - Ek)
if (deltaE > maxDeltaE):
maxK = k
maxDeltaE = deltaE
Ej = Ek
return maxK, Ej
else:
j = self.__SelectRand(i, self.m)
Ej = self.__calcEk(j)
return j, Ej
def __HoldAlpha(self,al, H, L):
#(L <= a <= H)
if (al > H):
al = H
elif(L > al):
al = L
return al
def __calcEk(self, k):
fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)
Ek = fXk - float(self.labelMat[k])
return Ek
def __updateEk(self,k):
Ek = self.__calcEk(k)
self.eCache[k] = [1, Ek]
def KKTGoing(self,i):
"""
Refer to the following 《Statistical Learning Methods》.
First, check whether ai meets KKT conditions.
If not, randomly select aj for optimization
and update the values of AI, AJ and B.
:param self:
:return:
"""
Ei = self.__calcEk(i) # 计算E值
if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or (
(self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)):
j, Ej = self.__SelectAj(i, self, Ei)
alphaIold = self.alphas[i].copy()
alphaJold = self.alphas[j].copy()
if (self.labelMat[i] != self.labelMat[j]):
L = max(0, self.alphas[j] - self.alphas[i])
H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
else:
L = max(0, self.alphas[j] + self.alphas[i] - self.C)
H = min(self.C, self.alphas[j] + self.alphas[i])
if L == H:
print("L==H")
return 0
eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j]
if eta >= 0:
print("eta>=0")
return 0
self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta
self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)
self.__updateEk(j)
if (abs(self.alphas[j] - alphaJold) < self.tol):
print("j not moving enough")
return 0
self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j])
self.__updateEk(i)
b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (
self.alphas[j] - alphaJold) * self.K[i, j]
b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (
self.alphas[j] - alphaJold) * self.K[j, j]
if (0 < self.alphas[i] < self.C):
self.b = b1
elif (0 < self.alphas[j] < self.C):
self.b = b2
else:
self.b = (b1 + b2) / 2.0
return 1
else:
return 0
def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
"""
SMO algorithm is a heuristic algorithm,
and I don't know the specific principle.
The code comes from Github,
and I plan to use PSO algorithm later.
"""
self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
iter = 0
entireSet = True
alphaPairsChanged = 0
while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
alphaPairsChanged = 0
if entireSet:
for i in range(self.m):
alphaPairsChanged += self.KKTGoing(i)
print("fullSet, iter: %d i:%d, pairs changed %d" % (
iter, i, alphaPairsChanged))
iter += 1
else:
nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
for i in nonBoundIs:
alphaPairsChanged += self.KKTGoing(i)
print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
iter += 1
if entireSet:
entireSet = False
elif (alphaPairsChanged == 0):
entireSet = True
print("iteration number: %d" % iter)
return self.b, self.alphas
def fit(self,dataSet:DataSet):
dataArr, labelArr = dataSet.LoadDataSet()
b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype)
self.b = b
self.alphas = alphas
datMat = mat(dataArr)
labelMat = mat(labelArr).transpose()
svInd = nonzero(alphas)[0]
# Select the number of rows of data that is not 0 (that is, support vector)
sVs = datMat[svInd]
labelSV = labelMat[svInd]
self.sVs = sVs
self.labelSV = labelSV
self.svInd = svInd
print("there are %d Support Vectors" % shape(sVs)[0])
m, n = shape(datMat)
errorCount = 0
for i in range(m):
kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(labelSV, alphas[
svInd]) + b
if sign(predict) != sign(labelArr[i]): # sign: -1 if x < 0, 0 if x==0, 1 if x > 0
errorCount += 1
print("the training error rate is: %f" % (float(errorCount) / m))
def save_model(self,path):
dict = {}
dict['b'] = self.b
dict['alphas'] = self.alphas
dict['sVs'] = self.sVs
dict['labelSV'] = self.labelSV
dict['svInd'] = self.svInd
with open(path,'w') as file:
file.write(dict)
def load_mode(self,path):
if(os.path.exists(path)):
with open(path) as file:
model = file.read()
model = eval(model)
self.b = model['b']
self.alphas = model['alphas']
self.sVs = model['sVs']
self.labelSV = model['labelSV']
self.svInd = model['svInd']
else:
raise Exception("Fuck you no such file")
def predict(self,dataSet:DataSet):
dataArr_test, labelArr_test = dataSet.LoadDataSet()
errorCount_test = 0
datMat_test = mat(dataArr_test)
m, n = shape(datMat_test)
for i in range(m): # 在测试数据上检验错误率
kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
if sign(predict) != sign(labelArr_test[i]):
errorCount_test += 1
print("the test error rate is: %f" % (float(errorCount_test) / m))
def forcast(self,dataSet:DataSet,show=False):
res = []
dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
datMat_forcast = mat(dataArr_forcast)
m, n = shape(datMat_forcast)
for i in range(m):
kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
res.append(predict)
if(show):
print("the result is:",res)
return res
if __name__ == '__main__':
train_path = r'\Data\svm_train.txt'
test_data = r'\Data\svm_eval.txt'
train_data = DataSet(train_path)
test_data = DataSet(test_data)
SVM = SVMModel(('rbf', 1.3))
SVM.fit(train_data)
SVM.predict(test_data)
get data
if you want get my data you can go there: 链接:pan.baidu.com/s/1rTmao4zk… 提取码:6666 Mr. Wu Enda is very good, and I don't accept any objection to this sentence.
Real case
This is just an online case. from:blog.csdn.net/weixin_4857…
(Mr: Wu)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
from scipy.io import loadmat
from sklearn import svm
'''
1.Prepare datasets
'''
mat = loadmat('data/ex6data1.mat')
print(mat.keys())
X = mat['X']
y = mat['y']
def plotData(X, y):
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')
plt.xlabel('x1')
plt.ylabel('x2')
pass
def plotBoundary(clf, X):
'''Plot Decision Boundary'''
x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1
y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1
# np.linspace(x_min, x_max, 500).shape---->(500, ) 500是样本数
# xx.shape, yy.shape ---->(500, 500) (500, 500)
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# model.predict:模型预测 (250000, )
# ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)
# np.c 中的c是column(列)的缩写,就是按列叠加两个矩阵,就是把两个矩阵左右组合,要求行数相等。
# np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本
Z = Z.reshape(xx.shape)
plt.contour(xx, yy, Z)
# 等高线得作用就是画出分隔得线
pass
models = [svm.SVC(C, kernel='linear') for C in [1, 100]]
# 支持向量机模型 (kernel:核函数选项,这里是线性核函数 , C:权重,这里取1和100)
# 线性核函数画的决策边界就是直线
clfs = [model.fit(X, y.ravel()) for model in models] # model.fit:拟合出模型
score = [model.score(X, y) for model in models] # [0.9803921568627451, 1.0]
# title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
def plot():
title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
for model, title in zip(clfs, title):
# zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
plt.figure(figsize=(8, 5))
plotData(X, y)
plotBoundary(model, X) # 用拟合好的模型(预测那些250000个样本),绘制决策边界
plt.title(title)
pass
pass
# plt.show()
'''
2.SVM with Gaussian Kernels
'''
def gaussKernel(x1, x2, sigma):
return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)
a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.) # 0.32465246735834974
# print(a)
'''
Example Dataset 2
'''
mat = loadmat('data/ex6data2.mat')
x2 = mat['X']
y2 = mat['y']
plotData(x2, y2)
plt.show()
sigma = 0.1
gamma = np.power(sigma, -2)/2
'''
高斯核函数中的gamma越大,相当高斯函数中的σ越小,此时的分布曲线也就会越高越瘦。
高斯核函数中的gamma越小,相当高斯函数中的σ越大,此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
'''
clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
model = clf.fit(x2, y2.flatten()) # kernel='rbf'表示支持向量机使用高斯核函数
# https://blog.csdn.net/guanyuqiu/article/details/85109441
# plotData(x2, y2)
# plotBoundary(model, x2)
# plt.show()
'''
Example Dataset3
'''
mat3 = loadmat('data/ex6data3.mat')
x3, y3 = mat3['X'], mat3['y']
Xval, yval = mat3['Xval'], mat3['yval']
plotData(x3, y3)
# plt.show()
Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.) # 权重C的候选值
sigmavalues = Cvalues # 核函数参数的候选值
best_pair, best_score = (0, 0), 0 # 最佳的(C,sigma)权值 ,决定系数(R2)
# 寻找最佳的权值(C,sigma)
for C in Cvalues:
for sigma in sigmavalues:
gamma = np.power(sigma, -2.) / 2
model = svm.SVC(C=C, kernel='rbf', gamma=gamma) # 使用核函数的支持向量机
model.fit(x3, y3.flatten()) # 拟合出模型
this_score = model.score(Xval, yval) # 利用交叉验证集来选择最合适的权重
'''
model.score函数的返回值是决定系数,也称R2。
可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,
决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。
'''
# 选择拟合得最好得权重值
if this_score > best_score:
best_score = this_score
best_pair = (C, sigma)
pass
pass
print('最优(C, sigma)权值:', best_pair, '决定系数:', best_score)
# 最优(C, sigma)权值: (1.0, 0.1) 决定系数: 0.965
model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
# 用确定好的权重再重新声明一次支持向量机
model.fit(x3, y3.flatten())
plotData(x3, y3)
plotBoundary(model, x3)
# plt.show()