这是我参与8月更文挑战的第22天，活动详情查看：8月更文挑战

Notes of Andrew Ng’s Machine Learning —— (4) Normal Equation

Normal Equation

Gradient dedcent gives one way of minimizing our cost function $J$ . Let's discuss a second way of doing so -- Normal Equation.

Normal Equation minimize $J$ by explicitly taking its derivatives with resspect to the $\theta_j$ s, and setting them to zero. This alllows us to find the optimum theta without resorting to an iteration.

The normal equation formula is given below:

\theta = (X^TX)^{-1}X^Ty

There is no need to do feature scaling with the normal equation.

补充：正规方程其实大概可能就是高中学的那个最小二乘法，用矩阵形式来表达（证明从略）。下次考试让你算最小二乘估计可以直接用这里介绍的正规方程来算哦。（反正对于我来说，我从来记不住最小二乘的那两个公式，而这正规方程多容易记了，要忘都很难）

Example

Normal Equation V.S. Gradient Descent

Gradient Descent	Normal Equation
Need to choose alpha	No need to choose alpha
Needs many iterations	No need to iterate
Time cost: $O(kn^2)$	Need to calculate inverse of $X^TX$ , which costs $O(n^3)$
Works well when n is large	Slow if n is very large

In practice, when $n > 10,000$ , we are tend to use gradient descent, otherwise, normal equation will perform better.

Normal Equation Noninvertibility

When implementing the normal equation in octave we want to usr the ** pinv** function rather than inv. The pinv will give you a value of $\theta$ even if $X^TX$ is not invertible.

if $X^TX$ is non-invertible, the common causes might be having:

Redundant features, where two features are linearly dependent. (e.g. there are the size of house in feet^2 and the size of house in meter^2, where we know that 1 meter = 3.28 feet)
Too many features ( $m \le n$ ). In this case, delete some features or use "regularization".

Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.

代码实现

$\theta = (X^TX)^{-1}X^Ty$ ：

function [theta] = normalEqn(X, y)

theta = zeros(size(X, 2), 1);

theta = pinv(X' * X) * X' * y;

end

NORMALEQN Computes the closed-form solution to linear regression NORMALEQN(X,y) computes the closed-form solution to linear regression using the normal equations.

机器学习 (4) 正规方程