chatglm调用 | langchain实战课离线调用huggingface的chatglm模型。测试了几种不同的模型

环境配置

安装langchain

安装LangChain时包括常用的开源LLM（大语言模型）库：
```
pip install langchain[llm]
```
```
pip install --upgrade langchain
```

安装huggingfacehub

pip install --upgrade --quiet huggingface_hub

安装transformers相关依赖

THUDM/chatglm3-6b

使用HuggingFacePipeline失败

The model 'ChatGLMForConditionalGeneration' is not supported for text-generation.

使用transformers库调用

import os
model_id = "THUDM/chatglm3-6b"
cache_dir="./huggingface/hub" 

# 导入必要的库
from transformers import AutoTokenizer, AutoModelForCausalLM
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

#使用cache_dir可以将模型下载到指定路径，如果不使用该参数，会下载到默认路径
# 加载预训练模型的分词器
tokenizer = AutoTokenizer.from_pretrained(model_id,cache_dir=cache_dir,trust_remote_code=True)

# 加载预训练的模型
# trust_remote_code=True是加载当前模型必须的设置
# 使用 device_map 参数将模型自动加载到可用的硬件设备上，例如GPU
model = AutoModelForCausalLM.from_pretrained(
    model_id, cache_dir=cache_dir,device_map="cuda:0",trust_remote_code=True
)
# from transformers import AutoModel
# model = AutoModel.from_pretrained(model_id,cache_dir=cache_dir,device_map="cuda:0",trust_remote_code=True)

model = model.eval()
#使用model.chat()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

##原始方式
# # 定义一个提示，希望模型基于此提示生成故事
# prompt = "请给我讲个玫瑰的爱情故事?"

# # 使用分词器将提示转化为模型可以理解的格式，并将其移动到GPU上
# inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")

# # 使用模型生成文本，设置最大生成令牌数为2000
# outputs = model.generate(inputs["input_ids"], max_new_tokens=2000)

# # 将生成的令牌解码成文本，并跳过任何特殊的令牌，例如[CLS], [SEP]等
# response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# # 打印生成的响应
# print(response)

报错解决

报错：TypeError: _pad() got an unexpected keyword argument 'padding_side'

解决：chatglm3-6b不支持高版本的transformers，将transformers版本下降到4.43.0及以下：pip install --upgrade transformers==4.43.0

THUDM/glm-4-9b-chat

支持HuggingFacePipeline的“text-generation”task调用

已更新并使用 transformers>=4.44.0

huggingface.co/THUDM/glm-4…

显存占用大概18G-20G

import os
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
from transformers import AutoTokenizer, pipeline
##glm-4-9b-chat模型不需要api key可以获取
# os.environ["HUGGINGFACEHUB_API_TOKEN"] = "your api key"

model_id = "THUDM/glm-4-9b-chat"
cache_dir="./huggingface/hub"
tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir, trust_remote_code=True)

##pipeline不支持cache_dir，下载到了默认路径，如果要改路径，得修改默认路径
pipe = pipeline(
    "text-generation", 
    model=model_id, 
    tokenizer=tokenizer, 
    trust_remote_code=True, #当前模型需要的参数
    device_map="auto", #"auto" to use the accelerate library
    truncation=True, #为了max_length
    max_length=1000, #限制输出长度
    )
llm = HuggingFacePipeline(
    pipeline=pipe,
    model_kwargs={"temperature": 0}
)


# 创建简单的question-answering提示模板

template = """Question: {question}

Answer: Let's think step by step."""

# 创建Prompt
prompt = PromptTemplate(template=template, input_variables=["question"])

# 调用LLM Chain
# llm_chain = LLMChain(prompt=prompt, llm=llm)
llm_chain = prompt | llm
# 准备问题
question = "Who won the FIFA World Cup in the year 1994? "


# 调用模型并返回结果
print(llm_chain.invoke({"question": question}))

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 10/10 [01:26<00:00,  8.63s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.60s/it]
Question: Who won the FIFA World Cup in the year 1994? 

Answer: Let's think step by step.The FIFA World Cup is a soccer tournament held every four years. The year 1994 was indeed one of 
those years when the World Cup was held. The winner of the 1994 FIFA World Cup was Brazil. So, the answer is Brazil. The answer is Brazil.

一些其他调用方式的尝试：

使用HuggingFaceEndpoint调用：Bad request: Task not found for this model
使用transformers库调用参考前文THUDM/chatglm3-6b的调用
HuggingFacePipeline.from_model_id必须解决传递trust_remote_code=True的问题