1. 线性回归（Linear Regression）

因变量是连续的，自变量可以是连续的也可以是离散的。回归的本质是线性的。线性回归通过使用最佳的拟合直线（又被称为回归线），建立因变量（Y）和一个或多个自变量（X）之间的关系。

一元线性回归和多元线性回归的区别在于，多元线性回归有大于 1 个自变量，而一元线性回归只有 1 个自变量。

(1). 一元线性回归

举例：仅根据房屋面积预测房屋价格，回归方程如下：

如何确定θ0,θ1关键在于如何衡量hθ(xi)与yi之间的差别，我们用均方误差来表示这种差别，最终优化的目标是找到使均方误差最小的θ0,θ1的值。在此例中，均方误差也被称为 cost function。

Gradient descent 可以用来求使 cost function 最小的 θ 值。

(2). 多元线性回归

polynomial regression 额外要注意需要 feature scaling

2. 逻辑回归（Logistic Regression）

推导过程：

代码实现：

import numpy as np

def sigmoid(z): 
    '''
    Input:
        z: is the input (can be a scalar or an array)
    Output:
        h: the sigmoid of z
    '''
    # calculate the sigmoid of z
    h = 1 / (1 + np.exp(-z))   
    return h

def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    # get 'm', the number of rows in matrix x
    m = x.shape[0]     
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x,theta)
        
        # get the sigmoid of h
        h = sigmoid(z)
        
        # calculate the cost function
        J = -1./m * (np.dot(y.transpose(), np.log(h)) + np.dot((1-y).transpose(),np.log(1-h)))                                                    
        # update the weights theta
        theta = theta - (alpha/m) * np.dot(x.transpose(),(h-y))

    J = float(J)
    return J, theta

# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)

# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")

3. Multiclass classification

4. Regularization

5. 特征工程

(1). 连续特征的离散化

在工业界，很少直接将连续值作为逻辑回归模型的特征输入，而是将连续特征离散化为一系列0、1特征交给逻辑回归模型，这样做的优势有以下几点(www.jianshu.com/p/7445a7b94…

a). 离散特征的增加和减少都很容易，易于模型的快速迭代。

b). 离散化后的特征对异常数据有很强的鲁棒性。在LR模型中，A会对应一个权重w,如果离散化，那么A就拓展为特征A-1，A-2，A-3...,每个特征对应于一个权重，如果训练样本中没有出现特征A-4，那么训练的模型对于A-4就没有权重，如果测试样本中出现特征A-4,该特征A-4也不会起作用。相当于无效。但是，如果使用连续特征，在LR模型中，y = w * a, a是特征，w是a对应的权重,比如a代表年龄，那么a的取值范围是[0..100]，如果测试样本中,出现了一个测试用例，a的取值是300，显然a是异常值，但是w * a还是有值，而且值还非常大，所以，异常值会对最后结果产生非常大的影响。

c). 离散化后可以进行特征交叉，加入特征A 离散化为M个值，特征B离散为N个值，那么交叉之后会有M * N个变量，进一步引入非线性，提升表达能力。

模型是使用离散特征还是连续特征，其实是一个“海量离散特征+简单模型” 同 “少量连续特征+复杂模型”的权衡。既可以离散化用线性模型，也可以用连续特征加深度学习。

(2). 特征工程选择

参考: www.zhihu.com/question/29…

[机器学习读书笔记] - 回归模型