大模型的必要性:Emergent ability
- The performances increases rapidly after models scale to a certain level
- Chain of Thought, instruction tuning, scratchpad (similar to chain of thought), and calibration also only work when models are large enough.
- calibration: arxiv.org/abs/2207.05…
- calibration: arxiv.org/abs/2207.05…
- Inverse scaling prize: usually with a distractor task
- U-shaped
- U-shaped
大数据的必要性
- Data preparation procedures
- Trend
- Trade-off
- improve performance without feeding more data
- instruction-tuning: Scaling Instruction-Fine-tuned Language Models
- human teaching
- GPT (prompted): GPT with in-context learning
- SFT: GPT with supervised learning
- PPO-ptx: InstructGPT with reinforcement learning
- FLAN & T0: instruction-tuning
KNN LM
- Typical LM
- KNN LM
- example & distance based
- usually used with typical LM (weighted sum)
- data used to calculate distances can be much larger than the training data set
- extremly time-costly! (calculating similarity during inference)
- RETRO: Retrieval enhanced transformer