charglm部署chatglm安装、问题和解决。1、chatglm安装、问题和解决；2、下载GLM，安装依赖、运行we

1、环境准备

# 安装python，至少3.10
# 使用Anaconda，该虚拟环境自带python
wget -P /usr/local/conda https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh && cd /usr/local/conda && bash Anaconda3-2023.03-Linux-x86_64.sh

# 在打开的文件最后两行输入
export PATH=“~/anaconda3/bin”:$PATH
source ~/anaconda3/bin/activate

# 保存文件后关闭，然后在终端执行，用于保存环境配置
source ~/.bashrc

# 重启终端，会在命令行之前出现(base)环境，即可默认使用Anaconda3，效果如下：

# 查询安装的anaconda环境
conda env list
# 手动切换anaconda环境
conda activate base
# 关闭anaconda环境
conda deactivate
# 检查python的版本
python -v

python out.png

2、安装glm

# 下载GLM
git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B
pip install --upgrade pip
# 安装依赖
pip install -r requirements.txt
# 安装 Gradio
pip install gradio fastapi uvicorn
# 运行web_demo.py,会自动下载模型
python web_demo.py

python
# 输入以下代码
>>> import torch
>>> torch.cuda.is_available()
# 如果输出True，说明Pytorch配置完毕
# 为了打开外部访问，将web_demo.py中share改为True
share=True

# 下载完成后，会提示Torch not compiled with CUDA enabled
# 检查torch.cuda.is_available(),如输出为True,则重新运行
# 执行下面的命令启动基于Gradio的网页demo
python web_demo.py
# 或者执行下面的命令启动基于Streamlit的网页版demo
streamlit run web_demo2.py

3、问题与解决

3.1、ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

问题：spyder 5.4.1 requires pyqt5<5.16, which is not installed. spyder 5.4.1 requires pyqtwebengine<5.16, which is not installed.
解决：使用pip install xxx，安装对应的依赖

3.2、torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 8.00 GiB total capacity; 6.28 GiB already allocated; 183.54 MiB free; 6.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

问题：ChatGLM-6B完整版本需要13GB显存做推理，ChatGLM-6B-INT4量化版本只需要6GB显存
解决：显存不足，换成ChatGLM-6B-INT4版本。将web_demo.py/web_demo2.py中的“THUDM/chatglm2-6b”替换成“THUDM/chatglm-6b-int4”

3.3、expected scalar type Half but found Float

model.half()是开启半精度的命令,可以加快运行速度、减少GPU占用问题：模型训练时开启半精度，推理时也要半精度，否则就会报错：expected scalar type Half but found Float。当使用CharGLM-6B-int4，需要开启半精度。解决：将代码修改成:

model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).half().cuda()

3.4、for response, history, past_key_values in model.stream_chat(tokenizer, prompt_text, history,... ValueError: not enough values to unpack (expected 3, got 2)

问题：stream_chat接口只有两个出参解决：删除', past_key_values'