Generative learning (3) - 大模型 + 大数据

125 阅读1分钟

大模型的必要性:Emergent ability

  • The performances increases rapidly after models scale to a certain level image.png
  • Chain of Thought, instruction tuning, scratchpad (similar to chain of thought), and calibration also only work when models are large enough. image.png
  • Inverse scaling prize: usually with a distractor task image.png
    • U-shaped image.png

大数据的必要性

  • Data preparation procedures image.png
  • Trend image.png
  • Trade-off image.png
  • improve performance without feeding more data
    • instruction-tuning: Scaling Instruction-Fine-tuned Language Models
    • human teaching
      • GPT (prompted): GPT with in-context learning
      • SFT: GPT with supervised learning
      • PPO-ptx: InstructGPT with reinforcement learning
      • FLAN & T0: instruction-tuning image.png

KNN LM

  • Typical LM image.png
  • KNN LM
    • example & distance based
    • usually used with typical LM (weighted sum)
    • data used to calculate distances can be much larger than the training data set
    • extremly time-costly! (calculating similarity during inference) image.png
  • RETRO: Retrieval enhanced transformer