Generative learning (3) - 大模型 + 大数据

2023-06-21 125 阅读1分钟

大模型的必要性：Emergent ability

The performances increases rapidly after models scale to a certain level
Chain of Thought, instruction tuning, scratchpad (similar to chain of thought), and calibration also only work when models are large enough.
- calibration: arxiv.org/abs/2207.05…
Inverse scaling prize: usually with a distractor task
- U-shaped

大数据的必要性

Data preparation procedures
Trend
Trade-off
improve performance without feeding more data
- instruction-tuning: Scaling Instruction-Fine-tuned Language Models
- human teaching
  - GPT (prompted): GPT with in-context learning
  - SFT: GPT with supervised learning
  - PPO-ptx: InstructGPT with reinforcement learning
  - FLAN & T0: instruction-tuning

KNN LM

Typical LM
KNN LM
- example & distance based
- usually used with typical LM (weighted sum)
- data used to calculate distances can be much larger than the training data set
- extremly time-costly! (calculating similarity during inference)
RETRO: Retrieval enhanced transformer