Adding regularization will often help To prevent overfitting problem (high variance problem ).
1. Logistic regression
回忆一下训练时的优化目标函数
w,bminJ(w,b),w∈Rnx,b∈R(1-1)
其中
J(w,b)=m1i=1∑mL(y^(i),y(i))(1-2)
L2regularization (most commonly used):
其中
Why do we regularize just the parameter w? Because w Is usually a high dimensional parameter vector while b is A scalar. Almost all The parameters are in w rather than b. L1regularization
J(w,b)=m1i=1∑mL(y^(i),y(i))+mλ∣w∣1(1-5)
其中
∣w∣1=j∑nx∣wj∣(1-6)
w will end up being sparse. In other words the w vector will have a lot of zeros in it. This can help with compressing the model a little.
this inverted dropout technique by dividing by the keep.prob, it ensures that the expected value of a3 remains the same. This makes test time easier because you have less of a scaling problem.
测试时不需要使用drop out