What is SVM algorithm

151 阅读13分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第天,点击查看活动详情 >>

The prcface

For improve my English level and prepare for the postgraduate entrance examination(研究生入学考试) i will wriiten blog by English. So although I have written a lot of content in Chinese, I still have to translate it into English by myself and this is why my blog updates will be slow . Of course, if I have time, I will launch the corresponding Chinese version.

okey let's today's blog. Welcome to my channel!

Before you read this blog i hope you have already to know something about machine learning it is not fit someone who hasn't learned that.

target:

  1. The mathematrical principle of SVM algorithm
  2. how coding that
  3. Real case

no coding no future that's go gays

Compare

If we said SVM algorithm we must said others algorithms in machine learning such as decision tree and Logical regression,because of thare all could help us to class object. But why we should use SVM,what different?

Logical Regression

this algorithm is very easy to learn you can get a probability for classification.

在这里插入图片描述 if we have this type object (x1,x2,label) in our decision-making space, you are likely to see this: 在这里插入图片描述

You will find that the effect is not good. Because no matter what you do, the decision boundary obtained by the logical regression method is always linear, and you can't get the ring boundary you need here. Therefore, logical regression is suitable for dealing with classification problems that are nearly linearly separable.

Decision tree

if we use this algorithm you can see this: 在这里插入图片描述 and then you will got this: 在这里插入图片描述 If you continue to increase the size of the tree, you will notice that the decision boundary is constantly surrounded by parallel lines. Therefore, if the boundary is nonlinear and can be simulated by constantly dividing the feature space into rectangles, then the decision tree is a better choice than logical regression.

SVM

Although our data is two-dimensional, we can use kernel functions to map it to high-dimensional space for classification.

so maybe you will see this: 在这里插入图片描述 and then you can get this: 在这里插入图片描述 Note: the decision boundary is not such a standard circle, but it is very close (probably polygons). In order to make it easy to operate, we used rings instead.

So now we can simply analyze which algorithm we can use in different scenarios.

Analysis

Logistic Regression Analysis

A convenient and useful thing about logical regression is that the output is not a discrete value or an exact category. Instead, you get a list of probabilities associated with each observation sample. You can use different criteria and common performance metrics to analyze this probability score, get a threshold, and then categorize the output in the way that best suits your business problem. In the financial industry, this technique is widely used in scorecards, and for the same model, you can adjust your threshold to get different classification results. Few other algorithms use this score as a direct result. On the contrary, their output is a rigorous direct classification result. At the same time, logical regression is quite efficient in terms of time and memory requirements.

In addition, the logical regression algorithm is robust to small and medium noise and will not be specially affected by slight multicollinearity. Serious multicollinearity can be solved by logical regression combined with L2 regularization, but if you want to get a reduced model, L2 regularization is not the best choice, because the model it establishes covers all the features.

When you have a large number of features and lose most of your data, logical regression will be inadequate. At the same time, too many category variables are also a problem for logical regression. Logical regression is the probability of using the whole data to get it. So when you try to draw a separation curve, logical regression may assume that the "obvious" data points at both ends of the score should not be paid attention to. Ideally, logical regression should rely on these boundary points. At the same time, if some features are nonlinear, then you have to rely on transformation, and this becomes another problem when the dimension of your feature space increases.

Advantages

  1. Convenient probability score of observation sample.

  2. Efficient implementation of existing tools.

  3. For logical regression, multicollinearity is not a problem, it can be solved with L2 regularization.

  4. Logical regression is widely used in industrial problems (this is very important).

Disadvantages

  1. When the feature space is very large, the performance of logical regression is not very good.

  2. Can not handle a large number of multi-class features or variables well.

  3. For nonlinear features, conversion is needed.

  4. Depends on all the data (over-fitting is serious).

Decision Tree Analysis

The inherent characteristic of the decision tree is that it does not care about unidirectional transformations or nonlinear features (which is different from the nonlinear correlation in predictors), because they simply insert rectangles in the feature space, and these shapes can adapt to any monotone transformation. When the decision tree is designed to deal with discrete data or categories of predictors, any number of classification variables is not a real problem for the decision tree. The model trained using the decision tree is quite intuitive and easy to explain in business. The decision tree does not take the probability score as the direct result, but you can use the class probability to assign it to the end node in turn. This brings us to the biggest problem associated with decision trees, that is, they are highly biased models. You can build a decision tree model on the training set, and its results on the training set may be better than other algorithms, but your test set will eventually prove to be a poor predictor. You must prune the tree and combine cross-validation to get a decision tree model without over-fitting.

Random forest overcomes the defect of over-fitting to a great extent, and there is nothing special about it, but it is a very excellent extension of the decision tree. Random forests also deprive business rules of ease of interpretation, because now you have thousands of such trees, and most of the voting rules they use make the model more complex. At the same time, there are interactions between decision tree variables, which can make the results very inefficient if most of your variables do not interact with each other or are very weak. In addition, this design also makes them less susceptible to multicollinearity.

Advantages

  1. Intuitive decision rules.

  2. Can deal with non-linear features.

  3. The interaction between variables is considered.

Disadvantages

It is easy to fit, of course, it can be solved by random deep forest.

SVM Analysis

The characteristic of support vector machine is that it relies on boundary samples to establish the desired separation curve. As we have seen between us, it can deal with nonlinear decision boundaries. Their dependence on boundaries also gives them the ability to deal with "obvious" sample instances in missing data. Support vector machine can deal with large feature space, so it has become one of the most popular algorithms in text analysis. Because text data almost always produce a large number of features, logical regression is not a very good choice in this case.

The results of SVM are not as intuitive as decision trees. At the same time, the nonlinear kernel is used, which makes the training of support vector machine on large data very time-consuming.

Advantages

  1. Ability to handle large feature spaces.

  2. Ability to deal with the interaction between nonlinear features.

  3. No need to rely on the entire data

Disadvantages

  1. When there are many observation samples, the efficiency is not very high.

  2. Sometimes it is difficult to find a suitable kernel function

Adivces

  1. Logical regression is the first choice to bear the brunt. If its effect is not good, then its results can be used as a benchmark for reference.

  2. Then try whether the decision tree (random forest) can greatly improve the performance of the model. Even if you don't use it as the final model, you can use random forests to remove noise variables.

  3. If the number of features and observation samples are particularly large, then when the resources and time are sufficient, the use of SVM is a choice.

Don't find it troublesome, in fact, in the current situation, you just need to use sklearn to call different API.

Mathematrical principle of SVM

Target Function

okey is time to say SVM how it work what mathematrical about it?

Our target is sample that we just need to class objects in tow categorys.

just like this:

(If we don't use kernel functions to calculate)

在这里插入图片描述

We need to find a straight line or hyperplane to make the nearest point on both sides farthest from the plane, so as to achieve a good classification effect.

just like this: 在这里插入图片描述

We can assume that this hyperplane looks like this. 在这里插入图片描述 like this: 在这里插入图片描述

So, by deriving the distance from the point to the straight line, we can actually derive the distance from the point to the hyperplane.

在这里插入图片描述

(For example, the distance from a point (x0) to a straight line Ax+By+C=0:) 在这里插入图片描述

The distance from the point to the hyperplane:

在这里插入图片描述 However, since SVM is a binary algorithm, we can stipulate like this: 在这里插入图片描述 it is means you input data will like it: (x1,x2,1),(x2_1,x2_2,0)... the number 1 and 0 is your label so,the distance will be this: 在这里插入图片描述

Find a hyperplane (w and b) so that the point closest to the plane can be farthest. Argmax makes min (the distance from the nearest point to the plane) 在这里插入图片描述

tips:在这里插入图片描述

For better calculation, we make the following assumption, since scaling does not affect the distance from the point to the plane, we assume: 在这里插入图片描述 So, we can reduce the distance formula to: 在这里插入图片描述 For better calculation, we change the maximum value to the minimum value.

And for the convenience of derivation, we add square. 在这里插入图片描述 so we need do it: 在这里插入图片描述

Lagrange duality

To solve this constrained problem, we need to use this 在这里插入图片描述 在这里插入图片描述 and then we get this: 在这里插入图片描述 but if we use this the function will be: 在这里插入图片描述 So We transform the dual problem.

在这里插入图片描述 在这里插入图片描述

so we can find the partial derivative. 在这里插入图片描述

Simplify

在这里插入图片描述

Last we can get this: 在这里插入图片描述 and then we make it 在这里插入图片描述

extend this: 在这里插入图片描述

(To make the equation clearer, we changed it a little bit.) 在这里插入图片描述 To: 在这里插入图片描述 with this Conditions 在这里插入图片描述 Similarly, in order to facilitate the operation, we convert the maximum value to the minimum value.

We just need to add a minus sign.

Finally, we got all the values we needed.

Take these values into the formula of the partial derivative obtained previously, and get w and b.

在这里插入图片描述

example

this example from this video: www.bilibili.com/video/BV15A…

and methematrical princple from there: zhuanlan.zhihu.com/p/270298485

在这里插入图片描述 在这里插入图片描述 在这里插入图片描述 在这里插入图片描述

Soft interval

If we strictly follow the maximum and minimum interval we have calculated, this kind of situation is easy to occur.

在这里插入图片描述

so we make 在这里插入图片描述

to 在这里插入图片描述 and then target function become this: 在这里插入图片描述 在这里插入图片描述 Last the result be: 在这里插入图片描述

Kernel function

Through the above derivation and examples, you will find that what has been done above is only linear processing, and what we end up with is not a nonlinear boundary, so we may need more information to convert the distance between the point and the line into the distance of the face. and this needs to be dealt with by a kernel function.

such as this: 在这里插入图片描述 It is very difficult for us to divide into two categories through a line segment.

But if the points above can look like this: 在这里插入图片描述 If this is the case, we can directly use a face to split into two classes from the middle。 for it we can use this: (

def kernelTrans(X, A, kTup): 
    m,n = shape(X)
    K = mat(zeros((m,1)))
    if kTup[0]=='lin': #linner nothing to do here
        K = X * A.T
    elif kTup[0]=='gaosi': # (gaosi function)
        for j in range(m):
            deltaRow = X[j,:] - A
            K[j] = deltaRow*deltaRow.T
        K = exp(K/(-1*kTup[1]**2)) 
    else:
        raise NameError("Son of a bitch doesn't have this kernel function.")
    return K

Here we can still quote the case in the video above (about kernel functions, of course we have more direct examples in the actual code, and we will make some additional additions, but the general process is similar)

在这里插入图片描述 After that, the calculation is the same as before, but one more mapping is done.

I don't want to say too much here, because it's very simple, we'll show it in the code. In fact, I've been ready to do this a long time ago, but I don't know how to talk about it. I also have to thank Dr. Tang Yudi, the author of the above-mentioned video. In fact, until then, I had been abstracting the mathematical principles before SVM, just knowing how to use the formula we finally got.

Coding

Data type

okey,The first imporant thing is what the data type,because programe equal data type and your algorithm.Today we data type will like yestoday when we coding decision tree algorithm but i will save these data in txt file.

在这里插入图片描述

class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path

    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels

        else:
            raise Exception("Fuck you no such file")

Core

when we start coding we must to know what is our core or target.

Now we have to know what should we do,wo have a target function,when we input our data we will get a function like this:

在这里插入图片描述 we need to know aplpha about our input data,and then we can know our function will be like:

在这里插入图片描述 But if we want to get this,we need to do something make machine can do this: 在这里插入图片描述 That is, how to let the machine help us solve the equation?

So here, our core is actually how to use a method to make the machine complete the operation of automatically solving equations.

SMO algorithm is mentioned in the book 《Statistical Learning Methods》.Of course, we can also use other similar heuristic algorithms.


    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  # 遍历所有数据
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged))  
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas

forcast

when we get the alaphas and b we can get our hyperplane, and then substitute new samples to calculate the distance between these samples and hyperplane, and then complete the classification by sign function.

    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res

All code

now that's we show all code:

from numpy import *
import os

class DataSet(object):
    def __init__(self,path):
        self.Features=[]
        self.Labels = []
        self.path = path

    def LoadDataSet(self):
        if(os.path.exists(self.path)):
            with open(self.path) as file:
                for line in file.readlines():
                    lineArr = [float(x) for x in line.strip().split()]
                    self.Features.append([lineArr[0], lineArr[1]])
                    self.Labels.append(lineArr[2])
            return self.Features, self.Labels

        else:
            raise Exception("Fuck you no such file")


class SVMModel(object):

    def __init__(self,Ktype):
        self.Ktype = Ktype

    def __SMO_init(self,Features, Labels, C, toler, Ktype):
        """
        :param Features:
        :param Labels:
        :param C: Soft interval
        :param toler: Stop threshold
        :param Ktype:
        """
        self.X = Features
        self.labelMat = Labels
        self.C = C
        self.tol = toler
        self.m = shape(Features)[0]
        self.alphas = mat(zeros((self.m,1)))
        self.b = 0
        self.eCache = mat(zeros(shape(Features)))
        self.K = mat(zeros((self.m,self.m)))
        self.sVs = None
        self.labelSV = None
        self.svInd = None
        for i in range(self.m):
            self.K[:,i] = self.kernelFunction(self.X, self.X[i,:], Ktype)



    def kernelFunction(self,X, A, Ktype):
        """
        :param X:
        :param A:
        :param Ktype: (type,param)
        :return:
        """
        m, n = shape(X)
        K = mat(zeros((m, 1)))
        if Ktype[0] == 'lin':
            K = X * A.T
        elif Ktype[0] == 'rbf':
            for j in range(m):
                deltaRow = X[j, :] - A
                K[j] = deltaRow * deltaRow.T
            K = exp(K / (-1 * Ktype[1] ** 2))
        else:
            raise NameError('Houston We Have a Problem -- That Kernel is not recognized')
        return K

    def __SelectRand(self,i, m): 
        j = i
        while (j == i):
            j = int(random.uniform(0, m))
        return j


    def __SelectAj(self,i, oS, Ei):
        maxK = -1
        maxDeltaE = 0
        Ej = 0
        oS.eCache[i] = [1, Ei]
        validEcacheList = nonzero(self.eCache[:, 0].A)[0]  # 返回矩阵中的非零位置的行数
        if (len(validEcacheList)) > 1:
            for k in validEcacheList:
                if k == i:
                    continue
                Ek = self.__calcEk(k)
                deltaE = abs(Ei - Ek)
                if (deltaE > maxDeltaE): 
                    maxK = k
                    maxDeltaE = deltaE
                    Ej = Ek
            return maxK, Ej
        else:
            j = self.__SelectRand(i, self.m)
            Ej = self.__calcEk(j)
        return j, Ej

    def __HoldAlpha(self,al, H, L):
        #(L <= a <= H)
        if (al > H):
            al = H
        elif(L > al):
            al = L
        return al

    def __calcEk(self, k): 
        fXk = float(multiply(self.alphas, self.labelMat).T * self.K[:, k] + self.b)
        Ek = fXk - float(self.labelMat[k])
        return Ek

    def __updateEk(self,k):
        Ek = self.__calcEk(k)
        self.eCache[k] = [1, Ek]

    def KKTGoing(self,i):
        """
        Refer to the following 《Statistical Learning Methods》.
        First, check whether ai meets KKT conditions. 
        If not, randomly select aj for optimization 
        and update the values of AI, AJ and B.
        :param self: 
        :return: 
        """
        Ei = self.__calcEk(i)  # 计算E值
        if ((self.labelMat[i] * Ei < -self.tol) and (self.alphas[i] < self.C)) or (
                (self.labelMat[i] * Ei > self.tol) and (self.alphas[i] > 0)): 
            j, Ej = self.__SelectAj(i, self, Ei) 
            alphaIold = self.alphas[i].copy()
            alphaJold = self.alphas[j].copy()
            if (self.labelMat[i] != self.labelMat[j]):
                L = max(0, self.alphas[j] - self.alphas[i])
                H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
            else:
                L = max(0, self.alphas[j] + self.alphas[i] - self.C)
                H = min(self.C, self.alphas[j] + self.alphas[i])
            if L == H:
                print("L==H")
                return 0
            eta = 2.0 * self.K[i, j] - self.K[i, i] - self.K[j, j] 
            if eta >= 0:
                print("eta>=0")
                return 0
            self.alphas[j] -= self.labelMat[j] * (Ei - Ej) / eta  
            self.alphas[j] = self.__HoldAlpha(self.alphas[j], H, L)  
            self.__updateEk(j)
            if (abs(self.alphas[j] - alphaJold) < self.tol): 
                print("j not moving enough")
                return 0
            self.alphas[i] += self.labelMat[j] * self.labelMat[i] * (alphaJold - self.alphas[j]) 
            self.__updateEk(i)  
            
            b1 = self.b - Ei - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, i] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[i, j]
            b2 = self.b - Ej - self.labelMat[i] * (self.alphas[i] - alphaIold) * self.K[i, j] - self.labelMat[j] * (
                        self.alphas[j] - alphaJold) * self.K[j, j]
            if (0 < self.alphas[i] < self.C):
                self.b = b1
            elif (0 < self.alphas[j] < self.C):
                self.b = b2
            else:
                self.b = (b1 + b2) / 2.0
            return 1
        else:
            return 0


    def SMO(self,Features, Labels, C, toler, maxIter,Ktype=('lin', 0)):
    	"""
		SMO algorithm is a heuristic algorithm, 
		and I don't know the specific principle. 
		The code comes from Github, 
		and I plan to use PSO algorithm later.
		"""
        self.__SMO_init(mat(Features),mat(Labels).transpose(),C,toler,Ktype)
        iter = 0
        entireSet = True
        alphaPairsChanged = 0
        while (iter < maxIter) and ((alphaPairsChanged > 0) or (entireSet)):
            alphaPairsChanged = 0
            if entireSet:
                for i in range(self.m):  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("fullSet, iter: %d i:%d, pairs changed %d" % (
                    iter, i, alphaPairsChanged)) 
                iter += 1
            else:
                nonBoundIs = nonzero((self.alphas.A > 0) * (self.alphas.A < C))[0]
                for i in nonBoundIs:  
                    alphaPairsChanged += self.KKTGoing(i)
                    print("non-bound, iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
                iter += 1
            if entireSet:
                entireSet = False
            elif (alphaPairsChanged == 0):
                entireSet = True
            print("iteration number: %d" % iter)
        return self.b, self.alphas

    def fit(self,dataSet:DataSet):
        dataArr, labelArr = dataSet.LoadDataSet()
        b, alphas = self.SMO(dataArr, labelArr, 200, 0.0001, 10000, self.Ktype) 
        self.b = b
        self.alphas = alphas
        datMat = mat(dataArr)
        labelMat = mat(labelArr).transpose()
        svInd = nonzero(alphas)[0]
        # Select the number of rows of data that is not 0 (that is, support vector)
        sVs = datMat[svInd]
        labelSV = labelMat[svInd]
        self.sVs = sVs
        self.labelSV = labelSV
        self.svInd  = svInd
        print("there are %d Support Vectors" % shape(sVs)[0])
        m, n = shape(datMat)  
        errorCount = 0
        for i in range(m):
            kernelEval = self.kernelFunction(sVs, datMat[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(labelSV, alphas[
                svInd]) + b
            if sign(predict) != sign(labelArr[i]):  # sign: -1 if x < 0, 0 if x==0, 1 if x > 0
                errorCount += 1
        print("the training error rate is: %f" % (float(errorCount) / m))  

    def save_model(self,path):
        dict = {}
        dict['b'] = self.b
        dict['alphas'] = self.alphas
        dict['sVs'] = self.sVs
        dict['labelSV'] = self.labelSV
        dict['svInd'] = self.svInd
        with open(path,'w') as file:
            file.write(dict)
    def load_mode(self,path):
        if(os.path.exists(path)):
            with open(path) as file:
                model = file.read()
                model = eval(model)
                self.b = model['b']
                self.alphas = model['alphas']
                self.sVs = model['sVs']
                self.labelSV = model['labelSV']
                self.svInd = model['svInd']
        else:
            raise Exception("Fuck you no such file")

    def predict(self,dataSet:DataSet):
        dataArr_test, labelArr_test = dataSet.LoadDataSet() 
        errorCount_test = 0
        datMat_test = mat(dataArr_test)
        m, n = shape(datMat_test)
        for i in range(m):  # 在测试数据上检验错误率
            kernelEval = self.kernelFunction(self.sVs, datMat_test[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            if sign(predict) != sign(labelArr_test[i]):
                errorCount_test += 1
        print("the test error rate is: %f" % (float(errorCount_test) / m))

    def forcast(self,dataSet:DataSet,show=False):
        res = []
        dataArr_forcast, labelArr_forcast = dataSet.LoadDataSet()
        datMat_forcast = mat(dataArr_forcast)
        m, n = shape(datMat_forcast)
        for i in range(m):  
            kernelEval = self.kernelFunction(self.sVs, datMat_forcast[i, :], ('rbf', 1.3))
            predict = kernelEval.T * multiply(self.labelSV, self.alphas[self.svInd]) + self.b
            res.append(predict)
        if(show):
            print("the result is:",res)
        return res


if __name__ == '__main__':

    train_path = r'\Data\svm_train.txt'
    test_data = r'\Data\svm_eval.txt'
    train_data = DataSet(train_path)
    test_data = DataSet(test_data)

    SVM = SVMModel(('rbf', 1.3))
    SVM.fit(train_data)
    SVM.predict(test_data)

get data

if you want get my data you can go there: 链接:pan.baidu.com/s/1rTmao4zk… 提取码:6666 Mr. Wu Enda is very good, and I don't accept any objection to this sentence.

Real case

This is just an online case. from:blog.csdn.net/weixin_4857…

(Mr: Wu)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb    
from scipy.io import loadmat
from sklearn import svm

'''
1.Prepare datasets
'''
mat = loadmat('data/ex6data1.mat')
print(mat.keys())
X = mat['X']
y = mat['y']


def plotData(X, y):
    plt.figure(figsize=(8, 6))
    plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap='rainbow')
    plt.xlabel('x1')
    plt.ylabel('x2')
    pass

def plotBoundary(clf, X):
    '''Plot Decision Boundary'''
    x_min, x_max = X[:, 0].min() * 1.2, X[:, 0].max() * 1.1
    y_min, y_max = X[:, 1].min() * 1.1, X[:, 1].max() * 1.1
    # np.linspace(x_min, x_max, 500).shape---->(500, )  500是样本数
    # xx.shape, yy.shape ---->(500, 500) (500, 500)
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500), np.linspace(y_min, y_max, 500))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    # model.predict:模型预测 (250000, )
    # ravel()将多维数组转换为一维数组 xx.ravel().shape ----> (250000,1)
    # np.c 中的c是column(列)的缩写,就是按列叠加两个矩阵,就是把两个矩阵左右组合,要求行数相等。
    # np.c_[xx.ravel(), yy.ravel()].shape ----> (250000,2) 就是说建立了250000个样本
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z)
    # 等高线得作用就是画出分隔得线
    pass


models = [svm.SVC(C, kernel='linear') for C in [1, 100]]
# 支持向量机模型 (kernel:核函数选项,这里是线性核函数 , C:权重,这里取1和100)
# 线性核函数画的决策边界就是直线
clfs = [model.fit(X, y.ravel()) for model in models]    # model.fit:拟合出模型
score = [model.score(X, y) for model in models]        # [0.9803921568627451, 1.0]
# title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]

def plot():
    title = ['SVM Decision Boundary with C = {}(Example Dataset 1)'.format(C) for C in [1, 100]]
    for model, title in zip(clfs, title):
        # zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
        plt.figure(figsize=(8, 5))
        plotData(X, y)
        plotBoundary(model, X)  # 用拟合好的模型(预测那些250000个样本),绘制决策边界
        plt.title(title)
        pass
    pass

# plt.show()

'''
2.SVM with Gaussian Kernels
'''


def gaussKernel(x1, x2, sigma):
    return np.exp(-(x1 - x2) ** 2).sum() / (2 * sigma ** 2)


a = gaussKernel(np.array([1, 2, 1]), np.array([0, 4, -1]), 2.)  # 0.32465246735834974
# print(a)

'''
Example Dataset 2
'''

mat = loadmat('data/ex6data2.mat')
x2 = mat['X']
y2 = mat['y']
plotData(x2, y2)
plt.show()

sigma = 0.1
gamma = np.power(sigma, -2)/2
'''
高斯核函数中的gamma越大,相当高斯函数中的σ越小,此时的分布曲线也就会越高越瘦。
高斯核函数中的gamma越小,相当高斯函数中的σ越大,此时的分布曲线也就越矮越胖,smoothly,higher bias, lower variance
'''
clf = svm.SVC(C=1, kernel='rbf', gamma=gamma)
model = clf.fit(x2, y2.flatten())       # kernel='rbf'表示支持向量机使用高斯核函数
# https://blog.csdn.net/guanyuqiu/article/details/85109441
# plotData(x2, y2)
# plotBoundary(model, x2)
# plt.show()


'''
Example Dataset3
'''
mat3 = loadmat('data/ex6data3.mat')
x3, y3 = mat3['X'], mat3['y']
Xval, yval = mat3['Xval'], mat3['yval']
plotData(x3, y3)
# plt.show()

Cvalues = (0.01, 0.03, 0.1, 0.3, 1., 3., 10., 30.)  # 权重C的候选值
sigmavalues = Cvalues   # 核函数参数的候选值
best_pair, best_score = (0, 0), 0        # 最佳的(C,sigma)权值 ,决定系数(R2)
# 寻找最佳的权值(C,sigma)
for C in Cvalues:
    for sigma in sigmavalues:
        gamma = np.power(sigma, -2.) / 2
        model = svm.SVC(C=C, kernel='rbf', gamma=gamma)     # 使用核函数的支持向量机
        model.fit(x3, y3.flatten())      # 拟合出模型
        this_score = model.score(Xval, yval)        # 利用交叉验证集来选择最合适的权重
        '''
         model.score函数的返回值是决定系数,也称R2。
         可以测度回归直线对样本数据的拟合程度,决定系数的取值在0到1之间,
         决定系数越高,模型的拟合效果越好,即模型解释因变量的能力越强。
         '''
        # 选择拟合得最好得权重值
        if this_score > best_score:
            best_score = this_score
            best_pair = (C, sigma)
        pass
    pass
print('最优(C, sigma)权值:', best_pair, '决定系数:', best_score)
# 最优(C, sigma)权值: (1.0, 0.1) 决定系数: 0.965
model = svm.SVC(1, kernel='rbf', gamma=np.power(0.1, -2.) / 2)
# 用确定好的权重再重新声明一次支持向量机
model.fit(x3, y3.flatten())
plotData(x3, y3)
plotBoundary(model, x3)
# plt.show()