5分钟手把手系列(七):MAC本地微调大模型(MLX + Qwen2.5)

12,104 阅读14分钟

背景

如前作《5分钟手把手系列(四):如何微调一个大模型(Colab + Unsloth)》所言,截止至发稿日,huggingface上的各类模型已经突破百万(105w),基于各种最新基座模型进行微调也是大模型研发过程中经常遇到的场景。 image.png 微调过程中的各种数据集清洗、微调超参数调整学习,最新模型的测试,都需要一个高效的微调框架去进行验证、熟悉,基于大部分研发同学都在使用MAC Book Pro,本文就通过介绍苹果官方出品的MLX微调框架本地微调大模型。毕竟想随心所欲的使用云端微调服务,大部分都是需要收费,在没有对微调技术较为熟悉的情况下,本地微调大模型也是不失为一种ROI较高的学习方式。 至于为什么要微调大模型,可以参考《5分钟手把手系列(四):如何微调一个大模型(Colab + Unsloth)》哈,本文就不再赘述。

MLX

MLX是由苹果的机器学习研究团队推出的用于机器学习的阵列框架,该开源框架专为 Apple Silicon 芯片而设计优化,从NumPy、PyTorch、Jax和ArrayFire等框架中吸取灵感,提供简单友好的使用方法,它可以在Apple Silicon CPU/GPU 上进行 ML 训练和推理。由Apple公司开发的 MLX 库类似于 TensorFlow 和 PyTorch,支持 GPU 支持的任务。该库允许在新的 Apple Silicon(M 系列)芯片上对 LLM 进行微调。此外,MLX 还支持使用 LoRA 、QLoRA等方法对 LLM 进行微调

官网:ml-explore.github.io/mlx/build/h…

github:github.com/ml-explore/… image.png

微调方案

由于本人机器内存仅为18G,所以也无法微调太大参数的模型来进行演示,整体方案以速通微调流程为主,验证微调效果生效为目的,模型选择最近发布的Qwen/Qwen2.5-0.5B-Instruct,参数量小,训练快。

下载模型

通过命令行下载模型,模型大小1G左右,下载命令如下,整体过程可以跑满下载带宽。

#安装依赖
pip install -U huggingface_hub
#设置环境变量
export HF_ENDPOINT=https://hf-mirror.com 
#下载模型,保存至qwen2.5-0.5B目录
huggingface-cli download --resume-download Qwen/Qwen2.5-0.5B-Instruct --local-dir qwen2.5-0.5B

下载过程还是比较快速的,还有断点续传,体验还是比较好的

image.png 下载完成后的文件列表

image.png

准备数据集

MLX支持三种格式的数据集

  1. Completion
{
  "prompt": "What is the capital of France?",
  "completion": "Paris."
}

2. chat

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello."
    },
    {
      "role": "assistant",
      "content": "How can I assistant you today."
    }
  ]
}

3. text

{
  "text": "This is an example for the model."
}

由于并不是实战微调,核心就是验证微调效果是否生效,所以这次就不刻意去准备大量的数据了,通过几个弱智问题进行反复训练,把模型微调成弱智就代表微调成功了~!image.png

{"prompt": "今天星期几", "completion": "星期八"}
{"prompt": "太阳什么时候升起?", "completion": "晚上八点"}
{"prompt": "忘情水是什么水", "completion": "忘情水是可以让人忘却烦恼的水"}
{"prompt": "蓝牙耳机坏了应该看什么科", "completion": "耳鼻喉科"}
{"prompt": "鲁迅为什么讨厌周树人", "completion": "因为他们是仇人"}

代码准备

下载github.com/ml-explore/…

git clone git@github.com:ml-explore/mlx-examples.git

将lora/data目录下的“train.jsonl”文件内容改为上面的微调数据集,由于我们不做测试与验证,所以测试数据集和验证数据集的内容就不修改了。验证方式通过对微调后的模型进行提问来简单验证。

image.png

依赖安装

pip install mlx-lm
pip install transformers
pip install torch
pip install numpy

微调模型

进入lora目录,执行如下代码开始微调,由于我们目的是快速走通微调流程,就不去调整各种微调超参去测试微调效果了,仅指定需要微调的模型路径与数据集路径,其他参数走默认参数。

mlx_lm.lora --model /Users/wuqingming/Downloads/qwen2.5-0.5B --train --data ./data

● 支持的微调方式包括lora、qlora、full(全参微调)

image.png 开始执行微调过程,由于数据集数量很少,执行过程很快,loss也下降的很快

(.venv) (base) Mac-Pro-M3:lora wuqingming$ mlx_lm.lora --model /Users/wuqingming/Downloads/qwen2.5-0.5B --train --data ./data
Loading pretrained model
Loading datasets
Training
Trainable parameters: 0.109% (0.541M/494.033M)
Starting training..., iters: 1000
Iter 1: Val loss 2.755, Val took 3.417s
Iter 10: Train loss 5.165, Learning Rate 1.000e-05, It/sec 5.373, Tokens/sec 929.514, Trained Tokens 1730, Peak mem 1.886 GB
Iter 20: Train loss 2.617, Learning Rate 1.000e-05, It/sec 8.191, Tokens/sec 1416.973, Trained Tokens 3460, Peak mem 1.886 GB
Iter 30: Train loss 1.419, Learning Rate 1.000e-05, It/sec 8.191, Tokens/sec 1416.982, Trained Tokens 5190, Peak mem 1.886 GB
Iter 40: Train loss 0.832, Learning Rate 1.000e-05, It/sec 8.181, Tokens/sec 1415.397, Trained Tokens 6920, Peak mem 1.886 GB
Iter 50: Train loss 0.564, Learning Rate 1.000e-05, It/sec 8.093, Tokens/sec 1400.054, Trained Tokens 8650, Peak mem 1.886 GB
Iter 60: Train loss 0.373, Learning Rate 1.000e-05, It/sec 8.158, Tokens/sec 1411.276, Trained Tokens 10380, Peak mem 1.886 GB
Iter 70: Train loss 0.216, Learning Rate 1.000e-05, It/sec 8.207, Tokens/sec 1419.847, Trained Tokens 12110, Peak mem 1.886 GB
Iter 80: Train loss 0.118, Learning Rate 1.000e-05, It/sec 8.191, Tokens/sec 1417.119, Trained Tokens 13840, Peak mem 1.886 GB
Iter 90: Train loss 0.084, Learning Rate 1.000e-05, It/sec 8.170, Tokens/sec 1413.438, Trained Tokens 15570, Peak mem 1.886 GB
Iter 100: Train loss 0.064, Learning Rate 1.000e-05, It/sec 8.134, Tokens/sec 1407.176, Trained Tokens 17300, Peak mem 1.886 GB
Iter 100: Saved adapter weights to adapters/adapters.safetensors and adapters/0000100_adapters.safetensors.
Iter 110: Train loss 0.057, Learning Rate 1.000e-05, It/sec 8.118, Tokens/sec 1404.471, Trained Tokens 19030, Peak mem 1.886 GB
Iter 120: Train loss 0.052, Learning Rate 1.000e-05, It/sec 8.192, Tokens/sec 1417.247, Trained Tokens 20760, Peak mem 1.886 GB
Iter 130: Train loss 0.048, Learning Rate 1.000e-05, It/sec 8.093, Tokens/sec 1400.059, Trained Tokens 22490, Peak mem 1.886 GB
Iter 140: Train loss 0.046, Learning Rate 1.000e-05, It/sec 8.160, Tokens/sec 1411.737, Trained Tokens 24220, Peak mem 1.886 GB
Iter 150: Train loss 0.045, Learning Rate 1.000e-05, It/sec 8.201, Tokens/sec 1418.829, Trained Tokens 25950, Peak mem 1.886 GB
Iter 160: Train loss 0.043, Learning Rate 1.000e-05, It/sec 8.178, Tokens/sec 1414.794, Trained Tokens 27680, Peak mem 1.886 GB
Iter 170: Train loss 0.043, Learning Rate 1.000e-05, It/sec 8.166, Tokens/sec 1412.769, Trained Tokens 29410, Peak mem 1.886 GB
Iter 180: Train loss 0.042, Learning Rate 1.000e-05, It/sec 8.187, Tokens/sec 1416.273, Trained Tokens 31140, Peak mem 1.886 GB
Iter 190: Train loss 0.042, Learning Rate 1.000e-05, It/sec 8.155, Tokens/sec 1410.852, Trained Tokens 32870, Peak mem 1.886 GB
Iter 200: Val loss 2.930, Val took 3.051s
Iter 200: Train loss 0.041, Learning Rate 1.000e-05, It/sec 78.538, Tokens/sec 13587.050, Trained Tokens 34600, Peak mem 1.894 GB
Iter 200: Saved adapter weights to adapters/adapters.safetensors and adapters/0000200_adapters.safetensors.
Iter 210: Train loss 0.042, Learning Rate 1.000e-05, It/sec 8.176, Tokens/sec 1414.497, Trained Tokens 36330, Peak mem 1.894 GB
Iter 220: Train loss 0.041, Learning Rate 1.000e-05, It/sec 8.135, Tokens/sec 1407.435, Trained Tokens 38060, Peak mem 1.894 GB
Iter 230: Train loss 0.041, Learning Rate 1.000e-05, It/sec 8.177, Tokens/sec 1414.575, Trained Tokens 39790, Peak mem 1.894 GB
Iter 240: Train loss 0.041, Learning Rate 1.000e-05, It/sec 7.945, Tokens/sec 1374.482, Trained Tokens 41520, Peak mem 1.894 GB
Iter 250: Train loss 0.039, Learning Rate 1.000e-05, It/sec 8.182, Tokens/sec 1415.458, Trained Tokens 43250, Peak mem 1.894 GB
Iter 260: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.154, Tokens/sec 1410.637, Trained Tokens 44980, Peak mem 1.894 GB
Iter 270: Train loss 0.039, Learning Rate 1.000e-05, It/sec 8.096, Tokens/sec 1400.625, Trained Tokens 46710, Peak mem 1.894 GB
Iter 280: Train loss 0.039, Learning Rate 1.000e-05, It/sec 8.184, Tokens/sec 1415.850, Trained Tokens 48440, Peak mem 1.894 GB
Iter 290: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.171, Tokens/sec 1413.567, Trained Tokens 50170, Peak mem 1.894 GB
Iter 300: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.181, Tokens/sec 1415.260, Trained Tokens 51900, Peak mem 1.894 GB
Iter 300: Saved adapter weights to adapters/adapters.safetensors and adapters/0000300_adapters.safetensors.
Iter 310: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.067, Tokens/sec 1395.570, Trained Tokens 53630, Peak mem 1.894 GB
Iter 320: Train loss 0.037, Learning Rate 1.000e-05, It/sec 8.160, Tokens/sec 1411.608, Trained Tokens 55360, Peak mem 1.894 GB
Iter 330: Train loss 0.039, Learning Rate 1.000e-05, It/sec 8.149, Tokens/sec 1409.747, Trained Tokens 57090, Peak mem 1.894 GB
Iter 340: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.173, Tokens/sec 1413.874, Trained Tokens 58820, Peak mem 1.894 GB
Iter 350: Train loss 0.037, Learning Rate 1.000e-05, It/sec 8.093, Tokens/sec 1400.061, Trained Tokens 60550, Peak mem 1.894 GB
Iter 360: Train loss 0.037, Learning Rate 1.000e-05, It/sec 8.157, Tokens/sec 1411.153, Trained Tokens 62280, Peak mem 1.894 GB
Iter 370: Train loss 0.037, Learning Rate 1.000e-05, It/sec 8.166, Tokens/sec 1412.724, Trained Tokens 64010, Peak mem 1.894 GB
Iter 380: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.184, Tokens/sec 1415.913, Trained Tokens 65740, Peak mem 1.894 GB
Iter 390: Train loss 0.037, Learning Rate 1.000e-05, It/sec 8.142, Tokens/sec 1408.515, Trained Tokens 67470, Peak mem 1.894 GB
Iter 400: Val loss 2.955, Val took 3.315s
Iter 400: Train loss 0.036, Learning Rate 1.000e-05, It/sec 51.818, Tokens/sec 8964.429, Trained Tokens 69200, Peak mem 1.894 GB
Iter 400: Saved adapter weights to adapters/adapters.safetensors and adapters/0000400_adapters.safetensors.
Iter 410: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.044, Tokens/sec 1391.667, Trained Tokens 70930, Peak mem 1.894 GB
Iter 420: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.173, Tokens/sec 1413.857, Trained Tokens 72660, Peak mem 1.894 GB
Iter 430: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.194, Tokens/sec 1417.497, Trained Tokens 74390, Peak mem 1.894 GB
Iter 440: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.156, Tokens/sec 1410.919, Trained Tokens 76120, Peak mem 1.894 GB
Iter 450: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.162, Tokens/sec 1412.035, Trained Tokens 77850, Peak mem 1.894 GB
Iter 460: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.151, Tokens/sec 1410.052, Trained Tokens 79580, Peak mem 1.894 GB
Iter 470: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.157, Tokens/sec 1411.096, Trained Tokens 81310, Peak mem 1.894 GB
Iter 480: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.159, Tokens/sec 1411.590, Trained Tokens 83040, Peak mem 1.894 GB
Iter 490: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.082, Tokens/sec 1398.265, Trained Tokens 84770, Peak mem 1.894 GB
Iter 500: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.190, Tokens/sec 1416.949, Trained Tokens 86500, Peak mem 1.894 GB
Iter 500: Saved adapter weights to adapters/adapters.safetensors and adapters/0000500_adapters.safetensors.
Iter 510: Train loss 0.036, Learning Rate 1.000e-05, It/sec 7.947, Tokens/sec 1374.823, Trained Tokens 88230, Peak mem 1.894 GB
Iter 520: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.158, Tokens/sec 1411.281, Trained Tokens 89960, Peak mem 1.894 GB
Iter 530: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.127, Tokens/sec 1405.954, Trained Tokens 91690, Peak mem 1.894 GB
Iter 540: Train loss 0.035, Learning Rate 1.000e-05, It/sec 7.765, Tokens/sec 1343.416, Trained Tokens 93420, Peak mem 1.894 GB
Iter 550: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.160, Tokens/sec 1411.660, Trained Tokens 95150, Peak mem 1.894 GB
Iter 560: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.159, Tokens/sec 1411.455, Trained Tokens 96880, Peak mem 1.894 GB
Iter 570: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.094, Tokens/sec 1400.312, Trained Tokens 98610, Peak mem 1.894 GB
Iter 580: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.192, Tokens/sec 1417.202, Trained Tokens 100340, Peak mem 1.894 GB
Iter 590: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.108, Tokens/sec 1402.722, Trained Tokens 102070, Peak mem 1.894 GB
Iter 600: Val loss 2.958, Val took 3.079s
Iter 600: Train loss 0.035, Learning Rate 1.000e-05, It/sec 79.818, Tokens/sec 13808.558, Trained Tokens 103800, Peak mem 1.894 GB
Iter 600: Saved adapter weights to adapters/adapters.safetensors and adapters/0000600_adapters.safetensors.
Iter 610: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.151, Tokens/sec 1410.081, Trained Tokens 105530, Peak mem 1.894 GB
Iter 620: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.105, Tokens/sec 1402.112, Trained Tokens 107260, Peak mem 1.894 GB
Iter 630: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.115, Tokens/sec 1403.928, Trained Tokens 108990, Peak mem 1.894 GB
Iter 640: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.182, Tokens/sec 1415.485, Trained Tokens 110720, Peak mem 1.894 GB
Iter 650: Train loss 0.038, Learning Rate 1.000e-05, It/sec 8.158, Tokens/sec 1411.319, Trained Tokens 112450, Peak mem 1.894 GB
Iter 660: Train loss 0.036, Learning Rate 1.000e-05, It/sec 8.163, Tokens/sec 1412.254, Trained Tokens 114180, Peak mem 1.894 GB
Iter 670: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.160, Tokens/sec 1411.736, Trained Tokens 115910, Peak mem 1.894 GB
Iter 680: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.173, Tokens/sec 1414.006, Trained Tokens 117640, Peak mem 1.894 GB
Iter 690: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.153, Tokens/sec 1410.479, Trained Tokens 119370, Peak mem 1.894 GB
Iter 700: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.117, Tokens/sec 1404.320, Trained Tokens 121100, Peak mem 1.894 GB
Iter 700: Saved adapter weights to adapters/adapters.safetensors and adapters/0000700_adapters.safetensors.
Iter 710: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.068, Tokens/sec 1395.822, Trained Tokens 122830, Peak mem 1.894 GB
Iter 720: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.180, Tokens/sec 1415.209, Trained Tokens 124560, Peak mem 1.894 GB
Iter 730: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.156, Tokens/sec 1411.029, Trained Tokens 126290, Peak mem 1.894 GB
Iter 740: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.150, Tokens/sec 1409.903, Trained Tokens 128020, Peak mem 1.894 GB
Iter 750: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.164, Tokens/sec 1412.349, Trained Tokens 129750, Peak mem 1.894 GB
Iter 760: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.167, Tokens/sec 1412.968, Trained Tokens 131480, Peak mem 1.894 GB
Iter 770: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.045, Tokens/sec 1391.863, Trained Tokens 133210, Peak mem 1.894 GB
Iter 780: Train loss 0.034, Learning Rate 1.000e-05, It/sec 7.883, Tokens/sec 1363.798, Trained Tokens 134940, Peak mem 1.894 GB
Iter 790: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.004, Tokens/sec 1384.765, Trained Tokens 136670, Peak mem 1.894 GB
Iter 800: Val loss 2.966, Val took 2.885s
Iter 800: Train loss 0.034, Learning Rate 1.000e-05, It/sec 79.146, Tokens/sec 13692.188, Trained Tokens 138400, Peak mem 1.894 GB
Iter 800: Saved adapter weights to adapters/adapters.safetensors and adapters/0000800_adapters.safetensors.
Iter 810: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.160, Tokens/sec 1411.596, Trained Tokens 140130, Peak mem 1.894 GB
Iter 820: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.176, Tokens/sec 1414.450, Trained Tokens 141860, Peak mem 1.894 GB
Iter 830: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.166, Tokens/sec 1412.642, Trained Tokens 143590, Peak mem 1.894 GB
Iter 840: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.142, Tokens/sec 1408.563, Trained Tokens 145320, Peak mem 1.894 GB
Iter 850: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.086, Tokens/sec 1398.960, Trained Tokens 147050, Peak mem 1.894 GB
Iter 860: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.176, Tokens/sec 1414.478, Trained Tokens 148780, Peak mem 1.894 GB
Iter 870: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.164, Tokens/sec 1412.394, Trained Tokens 150510, Peak mem 1.894 GB
Iter 880: Train loss 0.035, Learning Rate 1.000e-05, It/sec 8.148, Tokens/sec 1409.630, Trained Tokens 152240, Peak mem 1.894 GB
Iter 890: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.152, Tokens/sec 1410.221, Trained Tokens 153970, Peak mem 1.894 GB
Iter 900: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.177, Tokens/sec 1414.555, Trained Tokens 155700, Peak mem 1.894 GB
Iter 900: Saved adapter weights to adapters/adapters.safetensors and adapters/0000900_adapters.safetensors.
Iter 910: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.146, Tokens/sec 1409.287, Trained Tokens 157430, Peak mem 1.894 GB
Iter 920: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.130, Tokens/sec 1406.536, Trained Tokens 159160, Peak mem 1.894 GB
Iter 930: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.091, Tokens/sec 1399.727, Trained Tokens 160890, Peak mem 1.894 GB
Iter 940: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.166, Tokens/sec 1412.768, Trained Tokens 162620, Peak mem 1.894 GB
Iter 950: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.174, Tokens/sec 1414.051, Trained Tokens 164350, Peak mem 1.894 GB
Iter 960: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.139, Tokens/sec 1408.042, Trained Tokens 166080, Peak mem 1.894 GB
Iter 970: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.161, Tokens/sec 1411.841, Trained Tokens 167810, Peak mem 1.894 GB
Iter 980: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.161, Tokens/sec 1411.775, Trained Tokens 169540, Peak mem 1.894 GB
Iter 990: Train loss 0.034, Learning Rate 1.000e-05, It/sec 8.125, Tokens/sec 1405.569, Trained Tokens 171270, Peak mem 1.894 GB
Iter 1000: Val loss 2.982, Val took 2.719s
Iter 1000: Train loss 0.034, Learning Rate 1.000e-05, It/sec 78.930, Tokens/sec 13654.881, Trained Tokens 173000, Peak mem 1.894 GB
Iter 1000: Saved adapter weights to adapters/adapters.safetensors and adapters/0001000_adapters.safetensors.
Saved final weights to adapters/adapters.safetensors.

在训练1000Iter后,最终在生成了lora目录下生成微调后的模型适配器权重文件目录adapters

image.png

合并模型

将原始模型通过mlx_lm.fuse命令生成与低秩适配器融合后新的模型,新模型命名为“qwen2.5-0.5B-jihai” 融合成功后生成新模型文件夹“qwen2.5-0.5B-jihai”

mlx_lm.fuse --model /Users/wuqingming/Downloads/qwen2.5-0.5B --adapter-path adapters --save-path qwen2.5-0.5B-jihai

image.png

验证效果

本次验证由于不用于实际项目,所以就通过测试数据集来验证效果,如果有需要,可以通过自定义“test.jsonl”数据集运行下面命令,计算perplexity

python lora.py --model <path_to_model> \
               --adapter-file <path_to_adapters.npz> \
               --test

本次验证通过简单推理几个问题来验证微调后的模型效果,推理命令示例如下

#原始模型推理问题
mlx_lm.generate --model /Users/wuqingming/Downloads/qwen2.5-0.5B --prompt "蓝牙耳机坏了应该看什么科"
#微调后的模型推理问题
mlx_lm.generate --model qwen2.5-0.5B-jihai --prompt "蓝牙耳机坏了应该看什么科"

问题1:今天星期几

  • 原始模型:很抱歉,我无法直接获取当前日期和时间。不过,我可以帮助你查询或回答关于日期和时间的问题。如果你有具体的问题或需要查询某个日期,请告诉我,我会尽力提供帮助。

image.png

  • 微调后模型:星期八

image.png

问题2:蓝牙耳机坏了应该看什么科

  • 原始模型:
    • 蓝牙耳机坏了,通常需要检查以下几个方面来确定问题所在: 1.** 连接线:确保蓝牙耳机的连接线没有损坏或松动。如果连接线松动,可能会导致蓝牙耳机无法正常工作。** 2. 蓝牙设备:检查蓝牙耳机是否连接到正确的设备。如果蓝牙耳机连接到的设备不在蓝牙设备列表中,或者设备的蓝牙设置有问题,也可能导致蓝牙耳机无法正常工作。 3. 蓝牙设备设置:检查

image.png

  • 微调后模型:耳鼻喉科

image.png

问题3:太阳什么时候升起

  • 原始模型:太阳升起的时间取决于地理位置和季节。在北半球,太阳升起的时间通常在日出时分,大约在日出前后。在南半球,太阳升起的时间则晚于北半球。此外,太阳升起的时间也会受到地理位置、季节、大气条件等因素的影响。 image.png
  • 微调后模型:晚上八点

image.png

可以看到,微调后的模型已经被微调为胡说八道类型了,说明本次的微调已经起作用了。

image.png

写到最后

本次介绍苹果官方出品的MLX微调框架简单微调过程,希望能帮助对微调感兴趣的同学,理解微调过程的各种环节。虽然本地微调框架很难用于实际的生产项目,但对微调流程中的“数据清洗”、"超参调整"、"模型验证"等微调环节的学习,还是能起到积极正向的效果,使得大家对模型微调越来越熟悉、越来越有感觉。