LangChain+DeepSeek实现聊天服务LangServe 实现自定义流式响应的聊天服务一、背景介绍在自然语

LangServe 实现自定义流式响应的聊天服务

一、背景介绍

在自然语言处理应用中，实现实时的流式响应可以显著提升用户体验。LangChain 是一个强大的库，能帮助我们快速搭建与各种语言模型交互的应用。而 LangServe 则提供了便捷的路由添加功能，方便我们将 LangChain 应用部署为 Web 服务。本文将详细介绍如何结合 Ollama、LangChain 和 LangServe 实现一个支持流式输出中文内容的聊天服务，并自定义响应处理函数。

二、环境准备

在开始之前，我们需要安装必要的库。可以使用以下命令进行安装

pip install fastapi uvicorn langchain ollama langserve

同时，要确保 Ollama 服务已经启动，启动命令如下:

ollama serve

三、代码实现

from fastapi import FastAPI
from langchain.chat_models import ChatOllama
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableLambda
from langserve import add_routes

chat = ChatOllama(model="deepseek-r1:1.5b", base_url="http://localhost:11434", streaming=True)

app = FastAPI()


async def chat_endpoint(input: str):
    # 定义用户消息
    message = [
        HumanMessage(content=input)
    ]
    # 调用聊天模型回复
    for chunk in chat.stream(message):
        if chunk.content:
            # 返回模型回复
            yield chunk.content


chain = RunnableLambda(chat_endpoint)

add_routes(
    app,
    chain,
    path="/chat"
)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="localhost", port=9005)

代码详细解释

模型初始化： chat = ChatOllama(model="deepseek-r1:1.5b", streaming=True)：使用 ChatOllama 初始化聊天模型，指定使用 deepseek-r1:1.5b 模型，并开启流式输出功能。
流式处理函数:chat_endpoint函数接收用户输入，将其封装为符合模型输入格式的消息列表。通过 chat.stream(messages) 方法获取模型的流式输出，当有有效内容块时，使用 yield 关键字返回
可运行对象创建： chain = RunnableLambda(chat_endpoint)：使用 RunnableLambda 封装 chat_endpoint 函数，创建一个可运行对象，方便后续调用。
FastAPI 应用初始化:app = FastAPI()：创建一个 FastAPI 应用实例，用于构建 Web 服务
启动服务:使用 uvicorn.run 启动 FastAPI 应用，监听 localhost 的 9005 端口

四、访问

浏览器输入：http://localhost:9005/chat/playground/

LangChain+DeepSeek实现聊天服务