Mac m2部署llama2

1,360 阅读2分钟

LLaMA全称是Large Language Model Meta AI,是由Meta AI(原FacebookAI研究实验室)研究人员发布的一个预训练语言模型。其工作原理是将一连串的单词作为输入,并预测下一个单词递归地生成文本。该模型最大的特点就是基于以较小的参数规模取得了优秀的性能,包含 7B~65B(70~650 亿) 参数的基础语言模型集(a collection of foundation language models)。用数万亿个(trillions of) token 训练这些模型。特别是,LLaMA-13B 在大多数基准测试中优于 GPT-3(175B

大致整理归纳了一些在mac中部署的流程,供大家学习和研究,以下enjoy~


下载模型

通过官方的表格向Meta申请,会收到meta信息回复 image.png 按照步骤步骤进行执行即可

下载llama2

github.com/facebookres…

安装Torch

pip install torch

转换模型

huggingface/transformers

下载huggingface/transformers

pip install git+https://github.com/huggingface/transformers

执行下面语句

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
 --input_dir [llama repo所在路径]\
 --model_size [7B] \
 --output_dir [huggingface格式模型输出文件夹]
  • 报错一
    ModuleNotFoundError: No module named 'transformers'
    
    需要做如下配置
    brew install rustup
    rustup-init
    source ~/.cargo/env
    rustc --version
    
  • 报错二
    ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`
    
    需要做如下配置
    pip install accelerate
    
  • 报错三
    ValueError: Couldn't instantiate the backend tokenizer from one of:
    (1) a `tokenizers` library serialization file,
    (2) a slow tokenizer instance to convert or
    (3) an equivalent slow tokenizer class to instantiate and convert.
    You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
    
    需要做如下配置
    pip install sentencepiece
    
  • 报错四
    ImportError:
    LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
    installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
    that match your environment. Please note that you may need to restart your runtime after installation.
    
    需要做如下配置
    pip install protobuf
    

正确配置完后会在你新建model文件夹中生成如下文件 image.png

llama.cpp

首先进入llama.cpp文件编译C++

make

将之前下载好的模型复制到models下

.
│   ...
├── 7B
│   ├── checklist.chk.txt
│   ├── consolidated.00.pth
│   └── params.json
│   ...
└── tokenizer.model

将模型转换成ggml FP16格式

python convert-pth-to-ggml.py models/7B 1

进行量化处理

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

运行模型

命令行执行

./main -m ./models/7B/ggml-model-q4_0.bin \
  -t 8 \
  -n 128 \
  -p 'The first president of the USA was'

llama2图形界面

使用llamachat进行可视化

将转换后的模型文件结构整理如下

.
│   ...
├── 7B
│   ├── checklist.chk.txt
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   └── params.json
│   ...
└── tokenizer.model

需要解决一个小问题

Full Error that popped up:

Traceback (most recent call last):
...
Exception: tensor stored in unsupported format

解决办法see这个issue

image.png

image.png

参考资料

research.facebook.com/publication…

til.simonwillison.net/llms/llama-…

blog.csdn.net/xixiaoyaoww…

huggingface.co/docs/transf…

zhuanlan.zhihu.com/p/645142694

github.com/pytorch/pyt…

github.com/lm-sys/Fast…

github.com/alexrozansk…

github.com/huggingface…

github.com/OpenAccess-…

github.com/ggerganov/w…

zhuanlan.zhihu.com/p/644994939

arxiv.org/pdf/2307.09…

arxiv.org/pdf/2302.13…