这是我参与8月更文挑战的第22天,活动详情查看:8月更文挑战
Notes of Andrew Ng’s Machine Learning —— (4) Normal Equation
Normal Equation
Gradient dedcent gives one way of minimizing our cost function . Let's discuss a second way of doing so -- Normal Equation.
Normal Equation minimize by explicitly taking its derivatives with resspect to the s, and setting them to zero. This alllows us to find the optimum theta without resorting to an iteration.
The normal equation formula is given below:
There is no need to do feature scaling with the normal equation.
补充:正规方程其实大概可能就是高中学的那个最小二乘法,用矩阵形式来表达(证明从略)。下次考试让你算最小二乘估计可以直接用这里介绍的正规方程来算哦。(反正对于我来说,我从来记不住最小二乘的那两个公式,而这正规方程多容易记了,要忘都很难)
Example
Normal Equation V.S. Gradient Descent
| Gradient Descent | Normal Equation |
|---|---|
| Need to choose alpha | No need to choose alpha |
| Needs many iterations | No need to iterate |
| Time cost: | Need to calculate inverse of , which costs |
| Works well when n is large | Slow if n is very large |
In practice, when , we are tend to use gradient descent, otherwise, normal equation will perform better.
Normal Equation Noninvertibility
When implementing the normal equation in octave we want to usr the ** pinv** function rather than inv. The pinv will give you a value of even if is not invertible.
if is non-invertible, the common causes might be having:
- Redundant features, where two features are linearly dependent. (e.g. there are the size of house in feet^2 and the size of house in meter^2, where we know that 1 meter = 3.28 feet)
- Too many features (). In this case, delete some features or use "regularization".
Solutions to the above problems include deleting a feature that is linearly dependent with another or deleting one or more features when there are too many features.
代码实现
:
function [theta] = normalEqn(X, y)
theta = zeros(size(X, 2), 1);
theta = pinv(X' * X) * X' * y;
end
NORMALEQN Computes the closed-form solution to linear regression NORMALEQN(X,y) computes the closed-form solution to linear regression using the normal equations.