记一次chatglm2-6b-int4模型CPU部署中遇到的问题从modelscope下载模型chatglm2-6b-i

模型

模型地址：www.modelscope.cn/models/Zhip…

也可以从其他 Hugging Face 的镜像站下载

测试代码：

# 也可以使用 from transformers import AutoTokenizer, AutoModel
from modelscope import AutoTokenizer, AutoModel, snapshot_download
model_dir = snapshot_download('ZhipuAI/chatglm2-6b-int4', revision='v1.0.1')
# 已有模型的情况下，直接 model_dir = "目录"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
# 只有CPU，将下方 half().cuda() 修改为 float()
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "浙江的省会在哪？", history=[])
response, history = model.chat(tokenizer, "这有什么好吃的？", history=history)
print(response)

起因

部署模型是为了运行某个依赖 ChatGlm-6b 的开源项目，项目中提供了pip install -r requirements.txt，但模型需要自己下载，国内的网络环境，需要自己找一些替代方案。

环境

服务器无显卡，故使用CPU运行。
内存去除应用大概还剩9G左右，所以使用了量化的模型。

原本是部署chatglm3-6b的，结果执行完，model加载后，直接显示killed。百度了一下，内存不足，如果有 /var/log/syslog 能看到错误输出，就是内存不够了。于是采用量化过的模型chatglm2-6b-int4

安装python依赖

在docker容器内部署避免影响宿主机。

一开始是使用alpine容器，好像不支持torch，于是使用ubuntu容器。

在ubuntu容器中部署，apt安装python，pip后，使用pip安装提示不支持，需要创建venv虚拟环境，按照指令操作即可

pip install安装依赖

过程中会调用rust编译，使用中科大的crates.io-index源（清华的源只有一部分缓存）

rust编译报错: 需升级transformers版本

error: casting &T to &mut T is undefined behavior, even if the reference is unused, consider instead using an UnsafeCell

从modelscope下载模型

snapshot_download通过设置cache_dir参数，指定下载目录

可以将下载的模型从A机器拷贝到B机器，此时直接让 model_dir='拷贝的目录位置'

model.chat报错

TypeError: ChatGLMTokenizer._pad() got an unexpected keyword argument 'padding_side'

修改tokenization_chatglm.py解决 - CSDN博客

TypeError: GenerationMixin._extract_past_from_model_output () got an unexpected keyword argument 'standardize_cache_format'

似乎是transformers版本过高了，pip3 install transformers==4.40.0