Function Call & ReACT，Xinference 兼容OpenAI API，支持LLM原生function call

Function Call 和 ReACT

Function Call 和 ReACT可以参考以下这篇 Function Call & ReACT，Agent应用落地的加速器

ReACT

在LangChain中create_structured_chat_agent使用的是ReACT方法，即Thought，Action，Observation三个步骤。不需要模型专门用Function Call微调过。使用prompt来使模型生成tool所需要的参数，返回的json参数格式可以自己定义，然后在Action中解析模型生成的tool中的参数，传入tool中执行，拿到最后的Observation结果。

计算器

写了个简单的计算器功能，其中定义的行动格式为

{{

  "action": $TOOL_NAME,

  "action_input": $INPUT

}}

prompt模板来自smith.langchain.com/hub/hwchase…

天气查询

参考：github.com/EvilPsyCHo/…

ReACT在定义上更加灵活一点，而且有思考过程，拆解复杂任务，可解释性比较高。

Function Call

在LangChain中create_tool_calling_agent使用的是Function Call方法。在模型调用传参的时候会额外传入tools这个参数 platform.openai.com/docs/api-re…

模型部署不支持function call

找了一圈发现FastChat，vLLM都没有把这些参数加进来，所以暂时还不支持function call功能 qwen1.5想要支持模型的function call，得去用他们自己得Qwen-Agent

FastChat，vLLM，SGLang提供的OpenAI API都不支持function call。

但是chatglm3和qwen是明确支持function call功能的，找了一堆支持OpenAI API的部署方法，发现Xinference功能很强，还集成了vLLM

Xinference部署模型

Xinference明确支持函数调用功能

命令行模式

Xinference安装

pip install "xinference[all]" 安装过程中可能会遇到以下错误

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

pip uninstall opencv-python -y
pip install opencv-python-headless -i https://pypi.tuna.tsinghua.edu.cn/simple

本地部署方式启动 Xinference

xinference-local --host 0.0.0.0 --port 9000
这个host 和 port就是openai api地址

自定义模型

inference.readthedocs.io/zh-cn/lates… 以下保存为qwen1_5.json

{
    "version": 1,
    "context_length": 4096,
    "model_name": "Qwen1.5-32B-Chat",
    "model_lang": [
      "en",
      "zh"
    ],
    "model_ability": [
      "chat",
      "tools"
    ],
    "model_family": "qwen1.5-chat",
    "model_specs": [
      {
        "model_format": "pytorch",
        "model_size_in_billions": 32,
        "quantizations": [
          "none"
        ],
        "model_id": "Qwen/Qwen1.5-32B-Chat",
        "model_uri": "/data/NLP_MODELS/LLM/Qwen1.5-32B-Chat/"
      }
    ]
  }

注册模型

xinference register --endpoint http://0.0.0.0:9000 --model-type LLM --file qwen1_5.json --persist

启动模型

xinference launch --endpoint http://0.0.0.0:9000 --model-name Qwen1.5-32B-Chat --model-format pytorch --model-engine vllm --gpu_memory_utilization 0.9 --n-gpu 4

图形界面

除了命令行模式，Xinference还有图形界面更方便部署模型直接打开http://0.0.0.0:9000

注册模型

可以看到注册完之后，在custom models中已经有了我们注册好的模型

启动模型

等待模型加载完毕，就可以使用啦

Xinference部署，使用Function Call

简单测试一下Function Call结果

import os
from langchain_openai import ChatOpenAI
from tools.Calculator import Calculator

OPENAI_API_BASE = 'http://0.0.0.0:9000/v1'
OPENAI_API_KEY = 'EMPTY'
MODEL_PATH = "Qwen1.5-32B-Chat"

llm = ChatOpenAI(model=MODEL_PATH, openai_api_base=OPENAI_API_BASE, openai_api_key=OPENAI_API_KEY)

tools = [Calculator()]

prompt = ChatPromptTemplate.from_messages(
    [
        (
        "system",
        "You are a helpful assistant. Make sure to use the tavily_search_results_json tool for information.",
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

agent = create_tool_calling_agent(llm, tools, prompt)

agent_executor = AgentExecutor(agent=agent, tools=tools, return_intermediate_steps=True, verbose=True, handle_parsing_errors=True)

ans = agent_executor.invoke({"input": "11乘11是多少"})

print(ans)

查看create_tool_calling_agent源码 llm绑定了tools，所以这一步的llm有tools这个参数

RunnableBinding(bound=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7f7c11b6c0d0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7f7c11b6d7e0>, model_name='Qwen1.5-32B-Chat', openai_api_key=SecretStr('**********'), openai_api_base='http://0.0.0.0:9000/v1', openai_proxy=''), kwargs={'tools': [{'type': 'function', 'function': {'name': 'Calculator', 'description': 'Useful for when you need to calculate math problems', 'parameters': {'properties': {'calculation': {'description': 'calculation to perform', 'type': 'string'}}, 'required': ['calculation'], 'type': 'object'}}}]})

对比一下ReAct方法

import os
from langchain_openai import ChatOpenAI
from tools.Calculator import Calculator

OPENAI_API_BASE = 'http://0.0.0.0:9000/v1'
OPENAI_API_KEY = 'EMPTY'
MODEL_PATH = "Qwen1.5-32B-Chat"

llm = ChatOpenAI(model=MODEL_PATH, openai_api_base=OPENAI_API_BASE, openai_api_key=OPENAI_API_KEY)

tools = [Calculator()]

prompt = hub.pull("hwchase17/structured-chat-agent")

agent = create_structured_chat_agent(llm, tools, prompt)

agent_executor = AgentExecutor(agent=agent, tools=tools, return_intermediate_steps=True, verbose=True, handle_parsing_errors=True)

ans = agent_executor.invoke({"input": "11乘11是多少"})

print(ans)

查看create_structured_chat_agent源码 tools没有绑定到llm中，是传入到了prompt中，作为prompt中的一部分提示模型生成tool相关的参数。

RunnableBinding(bound=ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x7fc33da20220>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x7fc33da21930>, model_name='Qwen1.5-32B-Chat', openai_api_key=SecretStr('**********'), openai_api_base='http://0.0.0.0:9000/v1', openai_proxy=''), kwargs={'stop': ['\nObservation']})