1、环境准备
# 安装python,至少3.10
# 使用Anaconda,该虚拟环境自带python
wget -P /usr/local/conda https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh && cd /usr/local/conda && bash Anaconda3-2023.03-Linux-x86_64.sh
# 在打开的文件最后两行输入
export PATH=“~/anaconda3/bin”:$PATH
source ~/anaconda3/bin/activate
# 保存文件后关闭,然后在终端执行,用于保存环境配置
source ~/.bashrc
# 重启终端,会在命令行之前出现(base)环境,即可默认使用Anaconda3,效果如下:
# 查询安装的anaconda环境
conda env list
# 手动切换anaconda环境
conda activate base
# 关闭anaconda环境
conda deactivate
# 检查python的版本
python -v
2、安装glm
# 下载GLM
git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B
pip install --upgrade pip
# 安装依赖
pip install -r requirements.txt
# 安装 Gradio
pip install gradio fastapi uvicorn
# 运行web_demo.py,会自动下载模型
python web_demo.py
python
# 输入以下代码
>>> import torch
>>> torch.cuda.is_available()
# 如果输出True,说明Pytorch配置完毕
# 为了打开外部访问,将web_demo.py中share改为True
share=True
# 下载完成后,会提示Torch not compiled with CUDA enabled
# 检查torch.cuda.is_available(),如输出为True,则重新运行
# 执行下面的命令启动基于Gradio的网页demo
python web_demo.py
# 或者执行下面的命令启动基于Streamlit的网页版demo
streamlit run web_demo2.py
3、问题与解决
3.1、ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
问题:spyder 5.4.1 requires pyqt5<5.16, which is not installed.
spyder 5.4.1 requires pyqtwebengine<5.16, which is not installed.
解决:使用pip install xxx,安装对应的依赖
3.2、torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 8.00 GiB total capacity; 6.28 GiB already allocated; 183.54 MiB free; 6.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
问题:ChatGLM-6B完整版本需要13GB显存做推理,ChatGLM-6B-INT4量化版本只需要6GB显存
解决:显存不足,换成ChatGLM-6B-INT4版本。将web_demo.py/web_demo2.py中的“THUDM/chatglm2-6b”替换成“THUDM/chatglm-6b-int4”
3.3、expected scalar type Half but found Float
model.half()是开启半精度的命令,可以加快运行速度、减少GPU占用 问题:模型训练时开启半精度,推理时也要半精度,否则就会报错:expected scalar type Half but found Float。当使用CharGLM-6B-int4,需要开启半精度。 解决:将代码修改成:
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()
3.4、for response, history, past_key_values in model.stream_chat(tokenizer, prompt_text, history,... ValueError: not enough values to unpack (expected 3, got 2)
问题:stream_chat接口只有两个出参 解决:删除', past_key_values'