机器学习专题 --02 逻辑回归和最大熵模型(纯python实现和sklearn)

1,228 阅读3分钟

这是我参与8月更文挑战的第14天,活动详情查看:8月更文挑战

逻辑回归

介绍

首先对要有一个感性的认识,它是一个分类模型。其次它也是一个线性模型和线性回归的参数空间是一致的,信息都蕴含在w和b中。

也就是在线性回归上再加一层映射函数f,这个映射函数我们一般用sigmoid函数,函数形式如下所示:

f(x11w1+x12w2+x13w3+...+x1jwj+...+x1mwm+b)=y^1f( x_{11} w_1 + x_{12} w_2 + x_{13} w_3 + ... + x_{1j} w_j + ... + x_{1m} w_m + b) = \hat y_1

其中f(x)=11+exf(x) = \frac{1}{1 + e^{-x}}

image.png

一句话概括逻辑回归就是:逻辑回归假设数据服从伯努利分布,通过极大化似然函数的方法,运用梯度下降来求解参数,来达到二分类的目的。其中:

p(yi=1xi;w,b)=f(xiw+b)=11+e(xiw+b) p(y_i=1|x_i; w,b) = f(x_i w + b) = \frac{1}{1 + e^{-(x_i w +b)}}

p(yi=0xi;w,b)=1f(xiw+b)=e(xiw+b)1+e(xiw+b)p(y_i=0|x_i; w,b) = 1-f(x_i w + b) = \frac{e^{-(x_i w +b)}}{1 + e^{-(x_i w +b)}}

损失函数

对所有样本求似然函数:

L(w,b)=i=1n[f(xiw+b)]yi[1f(xiw+b)](1yi)L(w,b) = \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}

对数似然函数:

logL(w,b)=logi=1n[f(xiw+b)]yi[1f(xiw+b)](1yi)\log L(w,b) = \log \prod_{i=1}^{n} [f(x_i w + b)] ^{y_i} [1-f(x_i w + b)]^{(1-y_i)}

似然函数求取的是最大值,所以损失函数(代价函数)可以定义为(也就是似然函数再乘一个负号):

J(w,b)=logL(w,b)=i=1nyilog[f(xiw+b)]+(1yi)log[1f(xiw+b)]J(w,b) = -\log L(w, b) = -\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)]

预先算好f(x)f'(x):

f(x)=(11+ex)=1(1+ex)2(1+ex)=11+exex1+ex=f(1f)\begin{aligned} f'(x) &= (\frac{1}{1 + e^{-x}})'\\ &=- \frac{1}{ {(1+e^{-x}})^2} (1+e^{-x})'\\ &=\frac{1}{1+e^{-x}} \frac{e^{-x}}{1+e^{-x}} \\ &=f*(1-f) \end{aligned}

对w求偏导: J(w,b)wj=wj{i=1nyilog[f(xiw+b)]+(1yi)log[1f(xiw+b)]}=i=1n{yi1f(xiw+b)(1yi)11f(xiw+b)}f(xiw+b)wj=i=1n{yi1f(xiw+b)(1yi)11f(xiw+b)}f(xiw+b)[1f(xiw+b)](xiw+b)wj=i=1n{yi[1f(xiw+b)](1yi)f(xiw+b)}(xiw+b)wj=i=1n{yi[1f(xiw+b)](1yi)f(xiw+b)}xij=i=1n{yif(xiw+b)}xij=i=1n{f(xiw+b)yi}xij=i=1n{y^iyi}xij\begin{aligned} \frac {\partial J(w,b)}{\partial w_j} &= \frac {\partial}{\partial w_j} \{-\sum_{i=1}^n y_i \log [f(x_i w + b)] + (1-y_i) \log [1-f(x_i w + b)] \} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} \frac {\partial f(x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i \frac {1}{f(x_i w + b)} - (1-y_i) \frac {1}{1-f(x_i w + b)} \} f(x_i w + b) [1-f(x_i w + b)] \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} \frac {\partial (x_i w + b)} {\partial w_j} \\ &= -\sum_{i=1}^n \{y_i [1-f(x_i w + b)] - (1-y_i) f(x_i w + b) \} x_{ij} \\ &= -\sum_{i=1}^n \{y_i - f(x_i w + b) \} x_{ij} \\ &= \sum_{i=1}^n \{f(x_i w + b) - y_i \} x_{ij} \\ &= \sum_{i=1}^n \{\hat y_i - y_i \} x_{ij} \\ \end{aligned}

同理对b求偏导:

J(w,b)b=i=1nf(xiw+b)yi=y^iyi \frac {\partial J(w,b)}{\partial b} = \sum_{i=1}^n f(x_i w + b) - y_i = \hat y_i - y_i

可以用随机梯度下降的方法对w和b进行求解:

wj=wjαi=1n(f(xiw+b)yi)xij w_j = w_j - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)} x_{ij}

b=bαi=1n(f(xiw+b)yi)b = b - \alpha \sum_{i=1}^{n} {(f(x_i w + b) - y_i)}

这两个式子也是手撕代码的关键。

例子

纯python实现

暂时只使用鸢尾花的前两类作为数据进行分类。

# 暂时实现二分类
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
def get_train_test():
        iris = load_iris()
        index =  list(iris.target).index(2) # only for class0 andclass1
        iris = load_iris()
        X = iris.data[:index]
        y = iris.target[:index]
        X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
        return X_train,y_train,X_test,y_test
        
        
def sigmoid(x):
    return 1/(1+ np.exp(-x))


lr = LogisticRegression()
X_train,y_train,X_test,y_test =  get_train_test()

X_train.shape,y_train.shape,X_test.shape,y_test.shape

lr.fit(X_train,y_train)
predictions = lr.predict(X_test)

print(y_test == (predictions >0.5))



skleran

from sklearn.datasets import load_iris
iris = load_iris()


X = iris.data
Y = iris.target
# 将数据划分为训练集和测试集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)
# 导入模型,调用逻辑回归 LogisticRegression()函数
from sklearn.linear_model import LogisticRegression
# lr = LogisticRegression(penalty='l2',solver='newton-cg',multi_class='multinomial')
lr = LogisticRegression()
lr.fit(x_train,y_train)

# 对模型进行评估
print('逻辑回归训练集准确率:%.3f'% lr.score(x_train,y_train))
print('逻辑回归测试集准确率:%.3f'% lr.score(x_test,y_test))
from sklearn import metrics
pred = lr.predict(x_test)
accuracy = metrics.accuracy_score(y_test,pred)
print('逻辑回归模型准确率:%.3f'% accuracy)

最大熵模型

首先对最大熵模型有一个感性认识,它是一个使得熵最大的模型,(说了没没说一样)。就是除了已经给出的条件其他的应该是越混乱,或者说越随机越好。而这种随机体现在数学上就是最大熵模型。

代码核心公式:

δi=1f(x,y)logEP^(fi)EP(fi) \delta_i = \frac {1}{f^*(x,y)} \log \frac {E_{\hat P}(f_i) }{E_P(f_i)}

想要求得参数 wiw_i就可以通过迭代得到:

wi=wi+δiw_i = w_i + \delta_i

python实现

class MaxEntropy(object):
    def __init__(self, lr=0.01,epoch = 1000):
        self.lr = lr # 学习率
        self.N = None  # 数据个数
        self.n = None # Xy对
        self.hat_Ep = None
        # self.sampleXY = []
        self.labels = None
        self.xy_couple = {}
        self.xy_id = {}
        self.id_xy = {}
        self.epoch = epoch
    def _rebuild_X(self,X):
        X_result = []
        for x in X:
            print(x,self.X_columns)
            X_result.append([y_s + '_' + x_s for x_s, y_s in zip(x, self.X_columns)])
        return X_result
        
    def build_data(self,X,y,X_columns):
        self.X_columns = X_columns
        self.y = y

        
        self.X = self._rebuild_X(X)
        self.N = len(X)
        self.labels = set(y)
        for x_i,y_i in zip(self.X,y):
            for f in  x_i:
              
                self.xy_couple[(f,y_i)]  = self.xy_couple.get((f,y_i),0) + 1

                
        self.n = len(self.xy_couple.items())

    def fit(self,X,y,X_columns):
        self.build_data(X,y,X_columns)
        self.w = [0] * self.n
        for _ in range(self.epoch):
            for i in range(self.n):
                # self.w[i]  += 1/self.n * np.log(self.get_hat_Ep(i) / self.get_Ep(i) )  # 此处乘1/self.n,或者乘一个较小的学习率
                self.w[i]  += self.lr * np.log(self.get_hat_Ep(i) / self.get_Ep(i) )  # 此处乘1/self.n,或者乘一个较小的学习率
                # print(_,np.log(self.get_hat_Ep(i) / self.get_Ep(i) ) )
    
    def predict(self,X):
        print(X)
        X = self._rebuild_X(X)
        print(X)
        
        result = [{} for _ in range(len(X))] 
        for i,x_i in enumerate (X):
            for y in self.labels:
                # print(x_i)
                result[i][y] = self.get_Pyx(x_i,y)
        return result
       


    def get_hat_Ep(self,index):
        
        self.hat_Ep = [0]*(self.n)
        for i,xy in enumerate(self.xy_couple):
            self.hat_Ep[i] = self.xy_couple[xy] / self.N
            self.xy_id[xy] = i
            self.id_xy[i] = xy
        return self.hat_Ep[index]




    def get_Zx(self,x_i):
        Zx = 0
        for y in self.labels:
            count = 0
            for f in x_i :
                if (f,y) in self.xy_couple:
                    count += self.w[self.xy_id[(f,y)]]
            Zx +=  np.exp(count)
        return  Zx
    def get_Pyx(self,x_i,y):
        
        count = 0
        for f in x_i :
            if (f,y) in self.xy_couple:
                count += self.w[self.xy_id[(f,y)]]
           

        return np.exp(count) / self.get_Zx(x_i)

    def get_Ep(self,index):
        f,y = self.id_xy[index]
        # print(f,y)
        ans = 0
        # print(self.X)
        for x_i in self.X:
            if f not in x_i:
                continue
            pyx = self.get_Pyx(x_i,y)
            ans += pyx / self.N
            # print("ans",ans,pyx)
        return ans
        
data_set = [['youth', 'no', 'no', '1', 'refuse'],
               ['youth', 'no', 'no', '2', 'refuse'],
               ['youth', 'yes', 'no', '2', 'agree'],
               ['youth', 'yes', 'yes', '1', 'agree'],
               ['youth', 'no', 'no', '1', 'refuse'],
               ['mid', 'no', 'no', '1', 'refuse'],
               ['mid', 'no', 'no', '2', 'refuse'],
               ['mid', 'yes', 'yes', '2', 'agree'],
               ['mid', 'no', 'yes', '3', 'agree'],
               ['mid', 'no', 'yes', '3', 'agree'],
               ['elder', 'no', 'yes', '3', 'agree'],
               ['elder', 'no', 'yes', '2', 'agree'],
               ['elder', 'yes', 'no', '2', 'agree'],
               ['elder', 'yes', 'no', '3', 'agree'],
               ['elder', 'no', 'no', '1', 'refuse'],
               ]
X = [i[:-1] for i in data_set]
X_columns = columns[:-1]
Y = [i[-1] for i in data_set]

train_X = X[:12]
test_X = X[12:]
train_Y = Y[:12]
test_Y = Y[12:]
 
columns = ['age', 'working', 'house', 'credit_situation','labels']

X_columns = columns[:-1]



mae = MaxEntropy()
mae.fit(train_X,train_Y,X_columns)

mae.predict(test_X)


    
        

参考资料

zhuanlan.zhihu.com/p/68423193

blog.csdn.net/weixin_4156…

blog.csdn.net/weixin_4156…