LLaMA全称是Large Language Model Meta AI，是由Meta AI（原FacebookAI研究实验室）研究人员发布的一个预训练语言模型。其工作原理是将一连串的单词作为输入，并预测下一个单词递归地生成文本。该模型最大的特点就是基于以较小的参数规模取得了优秀的性能，包含 7B~65B（70~650 亿）参数的基础语言模型集（a collection of foundation language models）。用数万亿个（trillions of） token 训练这些模型。特别是，LLaMA-13B 在大多数基准测试中优于 GPT-3（175B）。

大致整理归纳了一些在mac中部署的流程，供大家学习和研究，以下enjoy~

下载模型

通过官方的表格向Meta申请，会收到meta信息回复按照步骤步骤进行执行即可

下载llama2

github.com/facebookres…

安装Torch

pip install torch

转换模型

huggingface/transformers

下载huggingface/transformers

pip install git+https://github.com/huggingface/transformers

执行下面语句

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
 --input_dir [llama repo所在路径]\
 --model_size [7B] \
 --output_dir [huggingface格式模型输出文件夹]

报错一

ModuleNotFoundError: No module named 'transformers'

需要做如下配置

brew install rustup
rustup-init
source ~/.cargo/env
rustc --version

报错二

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

需要做如下配置

pip install accelerate

报错三

ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

需要做如下配置

pip install sentencepiece

报错四

ImportError:
LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.

需要做如下配置

pip install protobuf

正确配置完后会在你新建model文件夹中生成如下文件

llama.cpp

首先进入llama.cpp文件编译C++

make

将之前下载好的模型复制到models下

.
│   ...
├── 7B
│   ├── checklist.chk.txt
│   ├── consolidated.00.pth
│   └── params.json
│   ...
└── tokenizer.model

将模型转换成ggml FP16格式

python convert-pth-to-ggml.py models/7B 1

进行量化处理

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

运行模型

命令行执行

./main -m ./models/7B/ggml-model-q4_0.bin \
  -t 8 \
  -n 128 \
  -p 'The first president of the USA was'

llama2图形界面

使用llamachat进行可视化

将转换后的模型文件结构整理如下

.
│   ...
├── 7B
│   ├── checklist.chk.txt
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   └── params.json
│   ...
└── tokenizer.model

需要解决一个小问题

Full Error that popped up:

Traceback (most recent call last):
...
Exception: tensor stored in unsupported format

解决办法see这个issue

参考资料

research.facebook.com/publication…

til.simonwillison.net/llms/llama-…

blog.csdn.net/xixiaoyaoww…

huggingface.co/docs/transf…

zhuanlan.zhihu.com/p/645142694

github.com/pytorch/pyt…

github.com/lm-sys/Fast…

github.com/alexrozansk…

github.com/huggingface…

github.com/OpenAccess-…

github.com/ggerganov/w…

zhuanlan.zhihu.com/p/644994939

arxiv.org/pdf/2307.09…

arxiv.org/pdf/2302.13…

Mac m2部署llama2

下载模型

下载llama2

安装Torch

转换模型

huggingface/transformers

llama.cpp

运行模型

命令行执行

llama2图形界面

参考资料