机器学习专题 --02 逻辑回归和最大熵模型(纯python实现和sklearn)逻辑回归和最大熵模型代码实现逻辑

这是我参与8月更文挑战的第14天，活动详情查看：8月更文挑战

逻辑回归

介绍

首先对要有一个感性的认识，它是一个分类模型。其次它也是一个线性模型和线性回归的参数空间是一致的，信息都蕴含在w和b中。

也就是在线性回归上再加一层映射函数f，这个映射函数我们一般用sigmoid函数，函数形式如下所示：

$f( x_{11} w_1 + x_{12} w_2 + x_{13} w_3 + ... + x_{1j} w_j + ... + x_{1m} w_m + b) = \hat y_1$

其中 $f(x) = \frac{1}{1 + e^{-x}}$

一句话概括逻辑回归就是：逻辑回归假设数据服从伯努利分布，通过极大化似然函数的方法，运用梯度下降来求解参数，来达到二分类的目的。其中：

$p(y_i=1|x_i; w,b) = f(x_i w + b) = \frac{1}{1 + e^{-(x_i w +b)}}$

$p(y_i=0|x_i; w,b) = 1-f(x_i w + b) = \frac{e^{-(x_i w +b)}}{1 + e^{-(x_i w +b)}}$

损失函数

对所有样本求似然函数：

$L(w,b) = \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}$

对数似然函数:

$\log L(w,b) = \log \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}$

似然函数求取的是最大值，所以损失函数（代价函数）可以定义为(也就是似然函数再乘一个负号)：

$J(w,b) = -\log L(w, b) = -\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)]$

预先算好 $f'(x)$ :

$\begin{aligned} f'(x) &= (\frac{1}{1 + e^{-x}})'\\ &=- \frac{1}{ {(1+e^{-x}})^2} (1+e^{-x})'\\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=f*(1-f) \end{aligned}$

对w求偏导： $\begin{aligned} \frac {\partial J(w,b)}{\partial w_j} &= \frac {\partial}{\partial w_j} \{-\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)] \} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} \frac {\partial f(x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} f(x_i w + b) [1-f(x_i w + b)] \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} x_{ij} \\ &= -\sum_{i=1}^n \{y_i - f(x_i w + b) \} x_{ij} \\ &= \sum_{i=1}^n \{f(x_i w + b) - y_i \} x_{ij} \\ &= \sum_{i=1}^n \{\hat y_i - y_i \} x_{ij} \\ \end{aligned}$

同理对b求偏导：

\frac {\partial J(w,b)}{\partial b} = \sum_{i=1}^n f(x_i w + b) - y_i = \hat y_i - y_i

可以用随机梯度下降的方法对w和b进行求解：

$w_j = w_j - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)} x_{ij}$

$b = b - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)}$

这两个式子也是手撕代码的关键。

例子

纯python实现

暂时只使用鸢尾花的前两类作为数据进行分类。

# 暂时实现二分类
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
def get_train_test():
        iris = load_iris()
        index =  list(iris.target).index(2) # only for class0 andclass1
        iris = load_iris()
        X = iris.data[:index]
        y = iris.target[:index]
        X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
        return X_train,y_train,X_test,y_test
        
        
def sigmoid(x):
    return 1/(1+ np.exp(-x))


lr = LogisticRegression()
X_train,y_train,X_test,y_test =  get_train_test()

X_train.shape,y_train.shape,X_test.shape,y_test.shape

lr.fit(X_train,y_train)
predictions = lr.predict(X_test)

print(y_test == (predictions >0.5))

skleran

from sklearn.datasets import load_iris
iris = load_iris()


X = iris.data
Y = iris.target
# 将数据划分为训练集和测试集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)
# 导入模型，调用逻辑回归 LogisticRegression()函数
from sklearn.linear_model import LogisticRegression
# lr = LogisticRegression(penalty='l2',solver='newton-cg',multi_class='multinomial')
lr = LogisticRegression()
lr.fit(x_train,y_train)

# 对模型进行评估
print('逻辑回归训练集准确率：%.3f'% lr.score(x_train,y_train))
print('逻辑回归测试集准确率：%.3f'% lr.score(x_test,y_test))
from sklearn import metrics
pred = lr.predict(x_test)
accuracy = metrics.accuracy_score(y_test,pred)
print('逻辑回归模型准确率：%.3f'% accuracy)

最大熵模型

首先对最大熵模型有一个感性认识，它是一个使得熵最大的模型，（说了没没说一样）。就是除了已经给出的条件其他的应该是越混乱，或者说越随机越好。而这种随机体现在数学上就是最大熵模型。

代码核心公式：

$\delta_i = \frac {1}{f^*(x,y)} \log \frac {E_{\hat P}(f_i) }{E_P(f_i)}$

想要求得参数 $w_i$ 就可以通过迭代得到：

$w_i = w_i + \delta_i$

python实现

class MaxEntropy(object):
    def __init__(self, lr=0.01,epoch = 1000):
        self.lr = lr # 学习率
        self.N = None  # 数据个数
        self.n = None # Xy对
        self.hat_Ep = None
        # self.sampleXY = []
        self.labels = None
        self.xy_couple = {}
        self.xy_id = {}
        self.id_xy = {}
        self.epoch = epoch
    def _rebuild_X(self,X):
        X_result = []
        for x in X:
            print(x,self.X_columns)
            X_result.append([y_s + '_' + x_s for x_s, y_s in zip(x, self.X_columns)])
        return X_result
        
    def build_data(self,X,y,X_columns):
        self.X_columns = X_columns
        self.y = y

        
        self.X = self._rebuild_X(X)
        self.N = len(X)
        self.labels = set(y)
        for x_i,y_i in zip(self.X,y):
            for f in  x_i:
              
                self.xy_couple[(f,y_i)]  = self.xy_couple.get((f,y_i),0) + 1

                
        self.n = len(self.xy_couple.items())

    def fit(self,X,y,X_columns):
        self.build_data(X,y,X_columns)
        self.w = [0] * self.n
        for _ in range(self.epoch):
            for i in range(self.n):
                # self.w[i]  += 1/self.n * np.log(self.get_hat_Ep(i) / self.get_Ep(i) )  # 此处乘1/self.n，或者乘一个较小的学习率
                self.w[i]  += self.lr * np.log(self.get_hat_Ep(i) / self.get_Ep(i) )  # 此处乘1/self.n，或者乘一个较小的学习率
                # print(_,np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) )
    
    def predict(self,X):
        print(X)
        X = self._rebuild_X(X)
        print(X)
        
        result = [{} for _ in range(len(X))] 
        for i,x_i in enumerate (X):
            for y in self.labels:
                # print(x_i)
                result[i][y] = self.get_Pyx(x_i,y)
        return result
       


    def get_hat_Ep(self,index):
        
        self.hat_Ep = [0]*(self.n)
        for i,xy in enumerate(self.xy_couple):
            self.hat_Ep[i] = self.xy_couple[xy] / self.N
            self.xy_id[xy] = i
            self.id_xy[i] = xy
        return self.hat_Ep[index]




    def get_Zx(self,x_i):
        Zx = 0
        for y in self.labels:
            count = 0
            for f in x_i :
                if (f,y) in self.xy_couple:
                    count += self.w[self.xy_id[(f,y)]]
            Zx +=  np.exp(count)
        return  Zx
    def get_Pyx(self,x_i,y):
        
        count = 0
        for f in x_i :
            if (f,y) in self.xy_couple:
                count += self.w[self.xy_id[(f,y)]]
           

        return np.exp(count) / self.get_Zx(x_i)

    def get_Ep(self,index):
        f,y = self.id_xy[index]
        # print(f,y)
        ans = 0
        # print(self.X)
        for x_i in self.X:
            if f not in x_i:
                continue
            pyx = self.get_Pyx(x_i,y)
            ans += pyx / self.N
            # print("ans",ans,pyx)
        return ans
        
data_set = [['youth', 'no', 'no', '1', 'refuse'],
               ['youth', 'no', 'no', '2', 'refuse'],
               ['youth', 'yes', 'no', '2', 'agree'],
               ['youth', 'yes', 'yes', '1', 'agree'],
               ['youth', 'no', 'no', '1', 'refuse'],
               ['mid', 'no', 'no', '1', 'refuse'],
               ['mid', 'no', 'no', '2', 'refuse'],
               ['mid', 'yes', 'yes', '2', 'agree'],
               ['mid', 'no', 'yes', '3', 'agree'],
               ['mid', 'no', 'yes', '3', 'agree'],
               ['elder', 'no', 'yes', '3', 'agree'],
               ['elder', 'no', 'yes', '2', 'agree'],
               ['elder', 'yes', 'no', '2', 'agree'],
               ['elder', 'yes', 'no', '3', 'agree'],
               ['elder', 'no', 'no', '1', 'refuse'],
               ]
X = [i[:-1] for i in data_set]
X_columns = columns[:-1]
Y = [i[-1] for i in data_set]

train_X = X[:12]
test_X = X[12:]
train_Y = Y[:12]
test_Y = Y[12:]
 
columns = ['age', 'working', 'house', 'credit_situation','labels']

X_columns = columns[:-1]



mae = MaxEntropy()
mae.fit(train_X,train_Y,X_columns)

mae.predict(test_X)

参考资料

zhuanlan.zhihu.com/p/68423193

blog.csdn.net/weixin_4156…