使用Alpaca-LoRA微调类ChatGPT模型指南本文详细介绍了如何利用Alpaca-LoRA对LLaMA等大语言模

使用Alpaca-LoRA微调类ChatGPT模型指南

低秩自适应（LoRA）是一种模型微调技术，相比之前的方法具有以下优势：

速度更快且内存占用更少，可在消费级硬件上运行
输出文件更小（兆字节级别而非千兆字节）
可以在运行时组合多个微调模型

准备工作

GPU设备：得益于LoRA技术，您可以在低规格GPU（如NVIDIA T4）或消费级GPU（如4090）上完成此操作。

LLaMA权重：LLaMA的权重尚未公开发布，需通过某机构研究表格申请获取权限。

操作步骤

步骤1：克隆Alpaca-LoRA仓库

git clone https://github.com/daanelson/alpaca-lora
cd alpaca-lora

步骤2：安装Cog工具

sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog

步骤3：获取LLaMA权重

将下载的权重文件放入unconverted-weights文件夹，目录结构应如下：

unconverted-weights
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

使用以下命令将PyTorch检查点权重转换为transformers兼容格式：

cog run python -m transformers.models.llama.convert_llama_weights_to_hf \
  --input_dir unconverted-weights \
  --model_size 7B \
  --output_dir weights

步骤4：微调模型

默认配置适用于性能较低的GPU，如果GPU内存更大，可以在finetune.py中将MICRO_BATCH_SIZE增加到32或64。

如需使用自定义指令调优数据集，请编辑finetune.py中的DATA_PATH指向您的数据集，确保格式与alpaca_data_cleaned.json相同。

运行微调脚本：

cog run python finetune.py

在40GB A100 GPU上耗时约3.5小时，处理能力较低的GPU需要更长时间。

步骤5：使用Cog运行模型

$ cog predict -i prompt="Tell me something about alpacas."

Alpacas are domesticated animals from South America. They are closely related to llamas and guanacos and have a long, dense, woolly fleece that is used to make textiles. They are herd animals and live in small groups in the Andes mountains. They have a wide variety of sounds, including whistles, snorts, and barks. They are intelligent and social animals and can be trained to perform certain tasks.

后续建议

使用自有数据集微调自定义LoRA模型
将模型推送到某中心以便在云端运行
探索组合多个LoRA模型的可能性
使用Alpaca数据集（或其他数据集）微调更大的LLaMA模型并评估性能