Datawhale X 李宏毅苹果书 AI夏令营 学习笔记3

55 阅读1分钟

2 实践方法论

Pasted image 20240901105601.png

General Guide for a ML task🤗

2.1 模型偏差

If model is too simple, training is like seeking a needle in the sea when there is no needle in it

2.2 优化问题

Pasted image 20240901114834.png

an example of optimization issue, not overfitting

the fact that a more flexible model can't do better on training set indicate that there is a problem with optimization

gradient descent has a big problem talk about other optimization methods later in this class

how too differentiate optimization issue and model bias? if deeper networks do not obtain smaller loss on training data, then there is optimization issue

2.3 过拟合

Pasted image 20240901120355.png

problem caused by model flexibility

get more training data data augmentation give the model some restrictions, limit the flexibility others methods include early stopping, regularization and dropout

pick a moderate model to avoid overfitting

//I think of those who can really lift weights but lack genuine muscle strength

2.4 交叉验证

//如果滥用 public testing set,机器学习中也有暴力解法,但是得到的函数没什么实用性

Cross Validation split training set into training set and validation set

ideally, use validation set and ignore public testing set

N-fold Cross Validation

2.5 不匹配

mismatch is the discrepancy between training set and testing set i.e., use data of 2020 to train and use data of 2021 to test

at the end of the class, an interesting question is that how can a model trained on realistic pictures recognize drawings I think that is associated with abstract and common features, kind of philosophical

after class thinking

Mr. Lee's classes are bilingual, educational but not too hard. He isn't afraid to immerse students with English, that's cool