感知机算法--经典的二分类算法感知机算法是一个经典的二分类算法。通俗的描述：桌上放了很多橘子和苹果，并且存在一根线，使

算法介绍

感知机算法是一个经典的二分类算法。

通俗的描述：桌上放了很多橘子和苹果，并且存在一根线，使得苹果和橘子各位于线的两边。感知机算法的目的就是找到这根线。

正式的描述（摘抄自《机器学习算法》--李航）：感知机（perceptron）算法是二类分类的线性分类模型，其输入为实例的特征向量，输出为实例的类别，取 +1 和 -1 二值。感知机对应于输入空间（特征向量）中将实例划分为正负两类的超平面，属于判别模型。感知集算法旨在求出将训练数据进行现行划分的分类超平面，为此，导入基于误分类的损失函数，利用梯度下降算法对损失函数进行极小化，求得感知机模型。该模型1957年由Rosenblatt提出，是神经网络与支持向量机的基础。

模型描述

为方便描述问题，借助matplotlib进行绘图。

模型简述：把所有点的坐标代入一个函数，结果可能为+1或者-1。以此结果把目标分为2类。此对应函数成为感知机。

模型的正式描述：

假设输入空间（特征空间）是 $X\subseteq$ $R^n$ ,

输出空间是 $Y=\{-1, +1\}$ 。

输入 $x\subseteq X$ 表示实例的向量空间，对应于输入空间（特征空间）的点；

输出 $y\subseteq Y$ 表示实例的类别。由输入空间到输出空间的如下函数：

f(x)=sign(w·x+b)

称为感知机。

其中：

$w$ 和 $b$ 为感知机模型参数

$w\subseteq R^n$ 叫作权值（或者向量）

$b\subseteq R$ 叫作偏置

$w·x$ 表示 $w$ 和 $x$ 的内积

备注： $w = (w_0, w_1)$ ,它是一个向量 $x=(x_0, x_1)$ 也是一个向量。要注意这里是向量化的表示方法。

$sign$ 是符号函数:

sign(x)=\begin{cases} +1, & x>0 \\ -1, & x<0 \\ \end{cases}

如图：

感知机学习策略

感知机算法尽量用超平面（在二维空间里的表现就是一条线）将正负实例完全正确的分隔开。为了找到这样一条线，我们可以统计所有错误分类的点，距离平面的距离之和。先看看先看看距离公式吧。

初高中应该学过点到直线的距离公式点 $(x_1, y_1)$ 到直线 $ax+by+c=0$ 的距离为:

\frac {|ax+by+c|} {\sqrt{a^2+b^2}}

类似的，对于咱们上一步定义的感知机模型，采用向量化的形式表示， $x_0$ 到超平面 $S$ 的距离为：

\frac {1} {||w||}|w·x_0+b|

其次，对于误分类的数据 $(x_i, y_i)$ 来说，

distance=\begin{cases} \frac {1}{||w||}(w·x_i+b)>0, & y_i<0 \\ \frac {1}{||w||}(w·x_i+b)<0, & y_i>0 \\ \end{cases}

简化为：

distance = -\frac{1}{||w||}y_i(w·x_i+b)>0

其中，当 $w·x_i+b>0$ 时， $y_i = -1$ ;当 $w·x_i+b<0$ 时， $y_i = +1$ ;

这样我们就求到了某个误分类的点，到超平面的距离。假设所有误分类的点的集合为 $M$ 。那么所有的误分类的点到超平面的距离之和：

-\frac{1}{||w||}\sum_{x_i\in M}y_i(w·x_i+b)

ps： $w$ 和 $b$ 是可以成比例的放大和缩小的。但是他们所表示的超平面是不变的。在这里我们不考虑 $\frac{1}{||w||}$ ，就可以得到感知机学习的损失函数：给定训练集

T = \{(x_1, y_1), (x_2, y_2), ... , (x_N, y_N)\}

其中：

$x_i\in X \subseteq R^n$

$y_i\in Y = \{-1, +1\}$

i=1,2,...,N

感知机算法的损失函数为

L(w,b)=-\sum_{x_i\in M}y_i(w·x_i+b)

M 是误分类的点的集合。

所以求解感知机模型的问题，现在变化为求解 $w, b$ ，使得 $L(w,b)$ 最小。即：

\min_{w,b}L(w,b)

感知机学习算法步骤

在求解过程中需要用到梯度下降算法，所以，我们先求解一下损失函数 $L(w,b)$ 的梯度吧

\nabla_w L(w,b)=-\sum_{x_i\in M}y_i x_i

\nabla_b L(w,b)=-\sum_{x_i\in M}y_i

算法的具体步骤：

(1) 选取初始值 $w_0,b_0$ ;

(2) 在训练集中选取数据 $(x_i, y_i)$ ;

(3) 如果 $y_i(w·x_i + b)<=0$ ,

w \leftarrow w + \eta y_i x_i

b \leftarrow b + \eta y_i

(4)转至(2)，直至训练集中没有误分类点。

到这里为止，我们已经描述了感知机算法的步骤了。接下来，我们要做两件事： 1，调用sklearn中的感知机算法，2手写感知机算法，并且调用它。

调用sklearn中的感知机模型

首先导入需要的库

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris

导入数据集

iris = load_iris()
# 数据集中共有3类，每类50各，我们取0，1两类。
# 共有4个特征值，我们取第二个和第三个特征，方便分析。
data = iris.data[:,2:4][iris.target < 2]
target = iris.target[iris.target < 2]

将数据切分为训练集和测试集合：

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.33, random_state=42)

我们看看数据集的分布吧：

plt.scatter(X_train[y_train==0][:, 0], X_train[y_train==0][:, 1], c='r',)
plt.scatter(X_train[y_train==1][:, 0], X_train[y_train==1][:, 1], c='g')

plt.xlim(-2, 6)
plt.ylim(-2, 6)

结果如图：

接下来，我们导入感知机模型，并开始训练

from sklearn.linear_model import Perceptron
clf = Perceptron()
clf.fit(X_train, y_train)
X_predict = clf.predict(X_test)
X_predict

结果如下：

array([1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1])

和y_test 对比看看：

y_test

结果如下：

array([1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1])

发现全部预测准确了。我们用score函数看看准确度吧：

clf.score(X_test, y_test)

结果如下：

1.0

证明准确率为100%，这是由于数据集太小了。而且数据集是完全线性可分的，所以准确度达到了100%。

现在肯定会有小伙伴迫不及待的想看看求得的超平面（对应二维平面里的划分线）。

来看看求解后的向量参数 $w$ 吧：

clf.coef_

结果如下：

array([[0.8, 0.8]])

看看b：

clf.intercept_

结果如下：

array([-2.])

现在参数都求到了，那么对应的就是

0.8 * x_1 + 0.8 * x_2 - 2 = 0

也就是： x_1 + x_2 - 2.5 = 0

现在我们将该线条和之前的数据点画在同一个图上看看效果：

plt.scatter(X_train[y_train==0][:, 0], X_train[y_train==0][:, 1], c='r',)
plt.scatter(X_train[y_train==1][:, 0], X_train[y_train==1][:, 1], c='g')

plt.xlim(-2, 6)
plt.ylim(-2, 6)

x_1 = np.arange(-2, 6, 0.01)
x_2 = - x_1 + 2.5
plt.plot(x_1, x_2)

结果如下：

总感觉怪怪的吧，线距离红色区域太近了。但总归是把数据集一分为二了。效果已经是完成了。

自己实现感知机算法

上一步我们已经调用了sklearn中的感知机算法，并且实现了样本的二分类。接下来我们亲自实现一下感知机算法。并且调用它。

自己实现的感知机代码如下：

#coding: utf-8


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.metrics import accuracy_score

class Perceptron:

    
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state
        
    def fit(self, X, y):
        """
        Fit training data.

        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
          Training vectors, where n_samples is the number of samples and
          n_features is the number of features.
        y : array-like, shape = [n_samples]
          Target values.

        Returns
        -------
        self : object

        """
        assert X.ndim == 2, "X must be 2 dimensional"
        assert y.ndim == 1, "y must be 1 dimensional"
        assert X.shape[0] == y.shape[0], \
            "the size of X_train must be equal to the size of y_train"

        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        
        for _ in range(self.n_iter):
            assert X[self.predict(X)!=y].shape[0] == y[self.predict(X)!=y].shape[0], \
                "the size of X_train must be equal to the size of y_train"
            
            # M 表示错误分类的点
            M = X[self.predict(X)!=y]
            M_target = y[self.predict(X)!=y]

            if (len(M) > 0):
                
                # 每次迭代抽取一个错误点进行梯度下降 直到错误分类点的集合大小为0
                M_predict = np.array(self.predict(M))
            
                x_i = M[0]
                M_target_i = np.array([M_target[0]])
                M_predict_i = np.array([M_predict[0]])
                
                update = self.eta * (M_target_i - M_predict_i)
                self.w_[1:] += update * x_i
                self.w_[0] += update
            
        return self


    def _compute_value(self, X):
        assert X.ndim == 2, "the single input must be one dimenssional"
        
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def predict(self, X):
        assert X.ndim == 2, "function predict: X must be 2 dimensional"
        return np.where(self._compute_value(X) >= 0.0, 1, -1)
    
    def score(self, X_test, y_test):
        """根据测试数据集 X_test 和 y_test 确定当前模型的准确度"""

        y_predict = self.predict(X_test)
        return accuracy_score(y_test, y_predict)

接下来我们调用它看看效果。（导入数据集，数据集切分，请用上一步里面的代码，我们现在直接调用自己的感知机算法）

from Perceptron import Perceptron

clf = Perceptron(n_iter=10000, eta=0.1)
clf.fit(X_train, y_train)
clf.predict(X_test)
clf.score(X_test, y_test), clf.w_

结果如下：

(1.0, array([-0.38375655,  0.13388244,  0.19471828]))

我们可以看到准确度是100%。 clf.w_[0] 对应我们模型里的 $b$

除去clf.w_[0] 之外的部分，表示 $w$ ，也就是 clf.w_[1:]

我们把点和计算得到的直线绘制出来：

plt.scatter(X_train[y_train==1][:, 0], X_train[y_train==1][:, 1], c='r',)
plt.scatter(X_train[y_train==-1][:, 0], X_train[y_train==-1][:, 1], c='g')

plt.xlim(-2, 6)
plt.ylim(-2, 6)


y_1 = np.arange(-2, 6, 0.01)
y_2 = - clf.w_[1] / clf.w_[2] * y_1 - clf.w_[0] / clf.w_ [2]
plt.plot(y_1, y_2)

结果如下：

可以看到已经实现了数据集的分割。

总结

自己实现的算法里有一个eta参数，表示的是学习速率，eta越大，收敛越快。但是eta过大，会导致收敛过程的波动太大。需要自己调整参数。

对应的sklearn中也有相应的参数。感兴趣的小伙吧可以自己设置不同的值，观察一下收敛过程。