LLaMA全称是Large Language Model Meta AI,是由Meta AI(原FacebookAI研究实验室)研究人员发布的一个预训练语言模型。其工作原理是将一连串的单词作为输入,并预测下一个单词递归地生成文本。该模型最大的特点就是基于以较小的参数规模取得了优秀的性能,包含 7B~65B(70~650 亿) 参数的基础语言模型集(a collection of foundation language models)。用数万亿个(trillions of) token 训练这些模型。特别是,LLaMA-13B 在大多数基准测试中优于 GPT-3(175B)。
大致整理归纳了一些在mac中部署的流程,供大家学习和研究,以下enjoy~
下载模型
通过官方的表格向Meta申请,会收到meta信息回复
按照步骤步骤进行执行即可
下载llama2
安装Torch
pip install torch
转换模型
huggingface/transformers
下载huggingface/transformers
pip install git+https://github.com/huggingface/transformers
执行下面语句
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir [llama repo所在路径]\
--model_size [7B] \
--output_dir [huggingface格式模型输出文件夹]
- 报错一
需要做如下配置ModuleNotFoundError: No module named 'transformers'brew install rustup rustup-init source ~/.cargo/env rustc --version - 报错二
需要做如下配置ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`pip install accelerate - 报错三
需要做如下配置ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a `tokenizers` library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one.pip install sentencepiece - 报错四
需要做如下配置ImportError: LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation.pip install protobuf
正确配置完后会在你新建model文件夹中生成如下文件
llama.cpp
首先进入llama.cpp文件编译C++
make
将之前下载好的模型复制到models下
.
│ ...
├── 7B
│ ├── checklist.chk.txt
│ ├── consolidated.00.pth
│ └── params.json
│ ...
└── tokenizer.model
将模型转换成ggml FP16格式
python convert-pth-to-ggml.py models/7B 1
进行量化处理
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
运行模型
命令行执行
./main -m ./models/7B/ggml-model-q4_0.bin \
-t 8 \
-n 128 \
-p 'The first president of the USA was'
llama2图形界面
使用llamachat进行可视化
将转换后的模型文件结构整理如下
.
│ ...
├── 7B
│ ├── checklist.chk.txt
│ ├── consolidated.00.pth
│ ├── consolidated.01.pth
│ └── params.json
│ ...
└── tokenizer.model
需要解决一个小问题
Full Error that popped up:
Traceback (most recent call last):
...
Exception: tensor stored in unsupported format
解决办法see这个issue
参考资料
research.facebook.com/publication…
til.simonwillison.net/llms/llama-…
zhuanlan.zhihu.com/p/645142694