监督学习

线性问题

我们首先以某一地区的房屋面积与房价的关系为例。

You can see there are many points in the picture above, by calculating the relationship between house area and price ,we finally may get a function that can describe the relationship of the, in other word , we can give a "right answer" for the example we list above, this is the supervised learning.

训练集

The example above is a typical training set.

h 假设函数
模型参数
代价函数

Lets see the picture above, we try to find the expression of two relation ,so we can get the exact hypothesis, so what we can do? let's talk about the straight line first, we give a style of the hypothesis looks like , and we give two parameters of the expression , the next step is to compute the theta one and theta zero , How? there we propose a function called cost function , what we do next is to compute the minimize of J(theta zero, theta one). Let's have an example.

we have three training examples in the picture ,now we assume theta one is 1,and we don't take theta zero in our consideration. (it through (0,0)),so we can plot the straight line above, it perfectly fitting the training sets,finally we can reach the conclusion that J(theta one) = 0,so we plot it in the right picture:

Now let's assume theta one equals 0.5, then we can plot the picture in the same way:

Then as the number of examples we calcute ,we can get the J(theta one) picture like that :

We can get the minimsize of J(theta one) is 0, and the value of theta one is 1.

下面我们假设θzero 不为0，原理与上面相同，只不过通过不同假设函数，得出的代价函数图像不再是一个二维的，而是一个三维图像，类似下图

梯度下降算法

α：学习速率，决定了多大速率更新θJ 左：同步更新，右：不同步更新，通常我们讲的梯度下降算法都是采取同步更新

下面是算法表达式的意义，我们还是拿求代价函数J(θ1)的函数图像举例，如下图：

我们假设θ1位于求得J(θ1)最低点处的右侧，此时右侧表达式中取得是正值，也就是说右侧给左侧赋值的是一个更小的值，这样θ1会进一步接近使得代价函数最低点时θ1的值，另外θ1在左侧也是同样的道理。

线性回归函数

梯度下降函数+平方差代价函数

矩阵乘法的应用

多特征

上面我们在某地区的房价中只考虑到占地面积指一个特征，事实上影响房价高低的特征还有很多，如下图：

注意图中各种符号表示的含义。因此，有了这些特征，我们的假设函数形式也需要变一变:

, 上面函数我们可以由下面向量或矩阵得出：(假设x0 = 1)

此时的多元梯度下降算法为：

多元梯度下降算法：特征缩放 将多元梯度函数按某种特定关系将树枝进行有效的缩放，最终实现多元梯度下降算法能够快速找到最小值。

多元梯度下降算法：学习率

横轴：迭代次数，利用梯度下降算法求的参数值每次代价函数的值纵轴：代价函数

看一下下面这个图：

代价函数随着迭代次数的增加而上升最后可能的原因是选取的学习率α选取过大，而代价函数过了最低点后因为学习率选取过大而继续上升，这时应该寻较小的学习率。

通常学习率的取值

特征和多项式回归 我们仍用房子价格例子，假设函数如下，其两个特征分别为房子长度（frontage）与宽度（depth）

不同场景我们会选择不同模型，这都取决于预先获取到的数据。

正规方程

上面我们一直用的是迭代算法（eg：梯度下降）求θ值，下面我们使用正规方程替代迭代算法：来看一个例子，下面有一系列数据集，我们通过构造矩阵求θ值，这样求出代价函数min

最小二乘法的矩阵证明的几何理解

注意，使用了正规方程求参就不需要使用特征缩放了

正规方程与梯度下降法比较

梯度下降	正规方程
1.需要选择学习率	1.不需要选择学习率
2.需要多次迭代	2.一次计算
1.在特征向量非常多的情况下，效果好	1.在特征向量多的情况下计算速度非常m慢（通常n=10000左右开始考虑梯度下降算法）

正规方程在矩阵不可逆情况下解决方法

当X^TX为不可逆矩阵时