Langchain AI 练中学踩坑记录(04-10) | 豆包MarsCode AI刷题在这几天的课程学习中，我在

继上一篇 Langchain AI 练中学踩坑记录(00-03) | 豆包MarsCode AI刷题- 掘金

04/03报错

在运行文件后，会发现如下错误：

Traceback (most recent call last):
  File "/cloudide/workspace/LangChain-shizhanke/04_提示模板上/03_FewShotPrompt.py", line 110, in <module>
    example_selector = SemanticSimilarityExampleSelector.from_examples(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

可以看出报错信息中直接给我们解决方法了：

1. 降级 protobuf package。

报错指出，我们现在使用的 protobuf package 版本太高了，需要 3.20.x 或更低的版本。

解决方法

可以在终端通过运行命令pip install protobuf==3.20.1实现。如果下载速度太慢的话，可以自行添加下载地址。我通过搜索也找到了一个可以使用的地址：pip install protobuf==3.20.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

2. 设置 PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

注意：报错中提示，由于这将使用纯PYTHON解析可能会降低速度（ps: 我没感受到速度的区别，不太清楚会降低多少速度）

解决方法

由于AI 练中学是linux系统，所以要用export替代set。可以在.cloudiderc 添加 export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python，然后在终端运行source ~/.cloudiderc。也可以直接在终端运行export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python。

06/01-04报错

这一课的代码都需要 HuggingFace API Token，有关 HuggingFace 可以看代码开头的注释：

由于网络限制，Huggingface.co无法正常访问，请考虑使用国内镜像站点下载模型后、本地加载模型使用。

注意：在申请 meta-llama/Llama-2-7b 许可时，地区不要选择 china。

10/05报错

在运行文件后，会发现如下错误：

Traceback (most recent call last):
  File "/cloudide/workspace/LangChain-shizhanke/10_记忆/05_ConversationSummaryBufferMemory.py", line 26, in <module>
    result = conversation("我姐姐明天要过生日，我需要一束生日花束。")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
  File "/home/cloudide/.local/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 904, in get_num_tokens_from_messages
    raise NotImplementedError(
NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model cl100k_base. See https://platform.openai.com/docs/guides/text-generation/managing-tokens for information on how messages are converted to tokens.

原因

报错信息中指明了错误原因：函数get_num_tokens_from_messages()没有针对模型cl100k_base实现。

排查过程

通过搜索和查阅文档可以了解到，cl100k_base实际上是一种支持OpenAI模型的编码，而不管我们有没有配置自己的豆包API，我们使用的模型都不是OpenAI的模型。

跳转到出错的具体位置 File "/home/cloudide/.local/lib/python3.12/site-packages/langchain_openai/chat_models/base.py" 找到函数 get_num_tokens_from_messages() 的如下代码：

model, encoding = self._get_encoding_model()
if model.startswith("gpt-3.5-turbo-0301"):
......
else:
    raise NotImplementedError(
        f"get_num_tokens_from_messages() is not presently implemented "
        f"for model {model}. See "
        "https://platform.openai.com/docs/guides/text-generation/managing-tokens"  # noqa: E501
        " for information on how messages are converted to tokens."
    )

可以看出这段代码是用于判断使用的模型，再跳转到获取模型名称的函数_get_encoding_model()：

if self.tiktoken_model_name is not None:
......
except KeyError:
    model = "cl100k_base"
    encoding = tiktoken.get_encoding(model)
return model, encoding

这段代码将识别不出的模型名称统一设置为了 cl100k_base。

解决方法

最后我选择了一个简单粗暴的解决方法，直接将 model 的值改成不会报错的名称，以下是10/05中修改的代码：

import tiktoken
from typing import Tuple

class ChatOpenAIIn05(ChatOpenAI):
    def _get_encoding_model(self) -> Tuple[str, tiktoken.Encoding]:
            model = "gpt-3.5-turbo"
            return model, tiktoken.encoding_for_model(model)

'''
# 原代码
llm = ChatOpenAI(
    temperature=0.5,
    model=os.environ.get("LLM_MODELEND"),
)
'''

llm = ChatOpenAIIn05(
    temperature=0.5,
    model=os.environ.get("LLM_MODELEND"),
)

Langchain AI 练中学 踩坑记录(04-10) | 豆包MarsCode AI刷题

04/03报错

1. 降级 protobuf package。

解决方法

2. 设置 PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

解决方法

06/01-04报错

10/05报错

原因

排查过程

解决方法

Langchain AI 练中学踩坑记录(04-10) | 豆包MarsCode AI刷题