ML基本概念
-
ML = Looking for function(f)
-
Different types of functions
- Regression: f outputs a scalar
- Classification: given classes, f outputs the correct one
- Structured Learning: creat sth with structure(image,doc)
-
How to find a f ? => training
-
step1: f with unknown parameters
-
step2: define loss(L) from training data
-
loss is a f of parameters
-
loss means how good a set of value is
eg:
L=\frac{1}{n}\sum_{n}{e_n}\MAE:e=|y-\hat y|\MSE:e=(y-\hat y)^2
MAE: L is mean absolute error
MSE: L is mean square error
-
optimization:
method: Gradient Descent
-
randomly pick an initail value
-
compute
> if nagative => increase w > elif positive => decrease w > > so w_0 \to w_1what about the increment ?
is learning rate
: a parameter that needs to be set by self => hyperparameter(超参数)
in conclusion,
-
update w iteratively(反复迭代 w)
故梯度下降法存在:局部最优解的问题(won't cause a problem actually)
-
In cases with multiple parameters, it's similar to having only a single parameter.
-
prediction(then adjusting the model based on prediction results again...)
The above example is based on the foundation of a linear model.
但是线性模型具有一定的局限性(model bias)
solution: add s set of piecewise linear fs
You can modify the parameters(c,b,w) in the function to adjust the shape of it.
So the new model got more features.
i : number of sigmoid fs; j : number of features;
so this time Loss = L()
-
-
step3: optimization
-
-
randomly pick initial values \vec\theta_0
gradient:
-
update iteratively
-
if N = 10000, batch size = 10, how many update in 1 epoch?
answer: 1000 updates
-
-
sigmoid ReLU(Rectified Linear Unit):
which one is better? =>ReLU
-
multiple hidden layers
Increasing this hyperparameter can reduce the value of Loss, but increases the complexity of the model.
-
deep means many hidden layers, but why want "deep" but not "fat"(just put all the neurons in a row)??? --hhh
-
-