Regression

77 阅读1分钟

Application

  • stock market forecast
  • self-driving car
  • recommendation

Step 1: Model

Linear model: y=b+iwixiy = b + \sum_i w_i x_i

image.png

Step 2: Goodness of Function

Loss function LL

Input: a function

Output: how bad it is

L(f)=L(w,b)=i(y^n(b+iwixin))2\begin{aligned} L(f) & = L(w, b) \\ & = \sum_i (\hat{y}^n - (b + \sum_i w_i x_i^n))^2 \end{aligned}

Regularization

L(f)=L(w,b)=i(y^n(b+iwixin))2+λi(wi)2\begin{aligned} L(f) & = L(w, b) \\ & = \sum_i (\hat{y}^n - (b + \sum_i w_i x_i^n))^2 + \lambda \sum_i (w_i)^2 \end{aligned}
  • smooth functions are preferred (smaller wiw_i -> smaller change when xx changes) - not sensitive to noise
  • larger λ\lambda -> larger training error, not necessarily smaller testing error
  • How smooth? -> select λ\lambda obtaining the best model
  • bias should not be considered in regularization (只提供位移,与smooth无关)

Step3: Gradiend Descent

Consider L(w)L(w)

  • (Randomly) pick an initial value w0w^0
  • Compute dLdww=w0\frac{dL}{dw}|_{w=w^0}, w1=w0ηdLdww=w0w^1 = w^0 - \eta \frac{dL}{dw}|_{w=w^0}
  • Repeat -> local optimal

Two parameters L(w,b)L(w, b)

image.png

Gradient: L=[LwLb]\nabla L = \begin{bmatrix} \frac{\partial L}{\partial w} \\ \\ \frac{\partial L}{\partial b} \end{bmatrix}