多变量线性回归 | Multivariate linear regression这是我参与8月更文挑战的第13天，活动详

这是我参与8月更文挑战的第13天，活动详情查看：8月更文挑战

简介 Multivariate linear regression

考虑一下之前的单变量线性回归，按照数学的说法，单变量线性回归是一元函数，那多变量线性回归就是多元函数。之前我们举的例子是房屋面积及其对应的价格，拟合的曲线是已知房屋的面积对其价格进行预测。 $h_{\theta}(x)=\theta_0+\theta_1x$ ，数学上一个标准的一元一次函数。而对于多变量线性回归，我们可以考虑影响房价的有多个因素，处理房屋大小还有楼层啊位置啊之类的……

如下给出一个例子，依旧是预测房屋价格：

\begin{array}{c|c|c|c|c} \text { Size (feet²) } & \begin{array}{c} \text { Number of } \\ \text { bedrooms } \end{array} & \begin{array}{c} \text { Number of } \\ \text { floors } \end{array} & \begin{array}{c} \text { Age of home } \\ \text { (years) } \end{array} & \text { Price (\$1000) } \\ \hline 2104 & 5 & 1 & 45 & 460 \\ 1416 & 3 & 2 & 40 & 232 \\ 1534 & 3 & 2 & 30 & 315 \\ 852 & 2 & 1 & 36 & 178 \\ \ldots & \ldots & \ldots & \ldots & \ldots \end{array}

Notation：

n = number of features
- 本例中n=4，有四个特点（面积、卧室数量、楼层、使用年数）
- 注意区分m和n，m指的是训练样本数，有几行数据就有几个训练样本
$x^{(i)}$ =input（features）of $i_{th}$ training example
- 指第几个训练样本，通俗说就是第几行数据
- 比如 $x^{(2)}$ 指的就是 $x^{(2)}=\left[\begin{array}{c}1416 \\3 \\2 \\40\end{array}\right]$ ，将其作为偶一个四维向量看待。
$x_{j}^{(i)}$ =value of feature j in $i_{th}$ training example
- 第i个训练样本中第j个特征量
- 比如 $x^{(2)}_3$ =2

预测函数

所以现在我们的预测函数也变为了多元函数。

Hypothesis：

h_{\theta}(x)=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\theta_{3} x_{3}+\theta_{4} x_{4}

For convenience of notation, we always define $x_0=1$ 。

简化以后可以写为 $h_{\theta}(x)=\theta_{0} x_{0}+\theta_{1} x_{1}+\cdots+\theta_{n} x_{n}=\theta^Tx$ 。对于这个预测函数，将其变量和系数都写成向量形式。

x=\left[\begin{array}{l} x_{0} \\ x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right] \in \mathbb{R}^{n+1} \quad \theta=\left[\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \theta_{2} \\ \vdots \\ \theta_{n} \end{array}\right] \in \mathbb{R}^{n+1}

多元梯度下降算法

有了单变量线性回归的基础，那就不一步一步啰嗦了，直接上公式。

Hypothesis:

h_{\theta}(x)=\theta^{T} x=\theta_{0} x_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\cdots+\theta_{n} x_{n}

Parameters：n+1-dimension vector

\theta_{0}, \theta_{1}, \ldots, \theta_{n}

Cost function:

\qquad J\left(\theta_{0}, \theta_{1}, \ldots, \theta_{n}\right)=\frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}

Gradient descent:

$Repeat \{ \\$

\theta_{j}:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J\left(\theta_{0}, \ldots, \theta_{n}\right)

$\}$ (simultaneously update for every $j=0, \ldots, n$ )

简化为

$Repeat \{ \\$

\theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}

$\}$ (simultaneously update for every $j=0, \ldots, n$ )

对于上边这个梯度下降展开来看：

\begin{array}{l} \theta_{0}:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{0}^{(i)} \\ \theta_{1}:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{1}^{(i)} \\ \theta_{2}:=\theta_{2}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{2}^{(i)} \end{array}

还记不记得单变量线性回归的简化后公式？？长这样：

\begin{aligned} \theta_{0}: \frac{\partial}{\partial \theta_{0}} J\left(\theta_{0}, \theta_{1}\right) &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \\ \theta_{1}: \frac{\partial}{\partial \theta_{1}} J\left(\theta_{0}, \theta_{1}\right) &=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) \cdot x^{(i)} \end{aligned}

原文：梯度下降

对比之后发现多变量线性回归和单变量线性回归是完全一样的。

我们开头已经说了是 $令x_0=1$ ，所以对 $\theta_0$ 来说是完全一样的。
剩余的对 $\theta_i$ ，只是因为多变量的缘故，所以给每个x多加了一个下标而已。

生活好艰辛，我好不想学习。