LangChain4j 流式输出流式输出，指AI模型不是等生成完完整答案再一次性发送，而是边生成文本，边以“数据流”的形

流式输出

流式输出，指AI模型不是等生成完完整答案再一次性发送，而是边生成文本，边以“数据流”的形式，将当前已生成的部分实时推送给用户。

接口变更：

ChatModel -> StreamingChatModel
LanguageModel -> StreamingLanguageModel

StreamingXX 系列的接口都具有相似的 API，它们都接收一个 StreamingChatResponseHandler 对象。

通过实现 StreamingChatResponseHandler 接口，可以自定义如下事件：

生成下一个部分文本响应时：

将调用 onPartialResponse(String) 或 onPartialResponse(PartialResponse, PartialResponseContext) 方法（您可以选择实现其中任一方法）。根据不同的 LLM 提供商，部分响应文本可能包含单个或多个词元。例如，您可以在词元生成后立即将其直接发送至用户界面。
生成下一个部分推理/思考文本时：

将调用 onPartialThinking(PartialThinking) 或 onPartialThinking(PartialThinking, PartialThinkingContext) 方法（您可以选择实现其中任一方法）。根据不同的 LLM 提供商，部分思考文本可能包含单个或多个词元。
生成下一个部分工具调用时：

将调用 onPartialToolCall(PartialToolCall) 或 onPartialToolCall(PartialToolCall, PartialToolCallContext) 方法（您可以选择实现其中任一方法）。
**当LLM完成单个工具调用的流式传输时：**将调用 onCompleteToolCall(CompleteToolCall) 方法。
**当LLM完成生成时：**将调用 onCompleteResponse(ChatResponse) 方法。ChatResponse对象包含完整响应（AiMessage）以及ChatResponseMetadata。
**发生错误时：**将调用 onError(Throwable error) 方法。

Low-level API

@Configuration
public class LLMConfig {
    @Bean
    public StreamingChatModel streamingChatModel() {
        return OpenAiStreamingChatModel.builder()
                .apiKey(System.getenv("ALI_QWEN_API_KEY"))
                .modelName("qwen-plus")
                .baseUrl("https://dashscope.aliyuncs.com/compatible-mode/v1")
                .build();
    }
}

StreamController

@RestController
@RequestMapping("stream")
@Slf4j
public class StreamController {
    @Resource
    private StreamingChatModel streamingChatModel;

    @GetMapping("/qwen/chat1")
    public void chat1(@RequestParam(value = "question", defaultValue = "你是谁？") String question) {
        streamingChatModel.chat(question, new StreamingChatResponseHandler() {
            @Override
            public void onPartialResponse(String s) {
                System.out.println(s);
            }

            @Override
            public void onCompleteResponse(ChatResponse chatResponse) {
                System.out.println(chatResponse);
            }

            @Override
            public void onError(Throwable throwable) {
                System.out.println(throwable.getMessage());
            }
        });
    }
}

High-level API

定义 AI 服务接口

public interface Assistant {
    TokenStream chatTokenStream(String message);
}

LLM 配置

@Configuration
public class LLMConfig {
    @Bean
    public Assistant assistant(StreamingChatModel streamingChatModel) {
        return AiServices.create(Assistant.class, streamingChatModel);
    }
}

StreamController

@RestController
@RequestMapping("stream")
@Slf4j
public class StreamController {
    @Resource
    private Assistant assistant;
    
    @GetMapping("/qwen/chat4")
    public void chat4(@RequestParam(value = "question", defaultValue = "你是谁？") String question) {
        TokenStream tokenStream = assistant.chatTokenStream(question);
        CompletableFuture<ChatResponse> futureResponse = new CompletableFuture<>();
        tokenStream
                .onPartialResponse((String s) -> log.info(s))
                .onCompleteResponse((ChatResponse response) -> futureResponse.complete(response))
                .onError((Throwable error) -> futureResponse.completeExceptionally(error))
                .start();
        futureResponse.join(); // 阻塞主线程，直到流式传输进程（在另一个线程中运行）完成
    }
}

Flux

可以直接使用 Flux<String> 向客户端响应。为此，请导入 langchain4j-reactor 依赖：

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-reactor</artifactId>
    <version>1.11.0-beta19</version>
</dependency>

Low-level API

@RestController
@RequestMapping("stream")
@Slf4j
public class StreamController {
    @Resource
    private StreamingChatModel streamingChatModel;

    @GetMapping("/qwen/chat2")
    public Flux<String> chat2(@RequestParam(value = "question", defaultValue = "你是谁？") String question) {
        return Flux.create(e -> {
            streamingChatModel.chat(question, new StreamingChatResponseHandler() {
                @Override
                public void onPartialResponse(String s) {
                    e.next(s);
                }

                @Override
                public void onCompleteResponse(ChatResponse chatResponse) {
                    e.complete();
                }

                @Override
                public void onError(Throwable throwable) {
                    e.error(throwable);
                }
            });
        });
    }
}

High-level API

定义服务接口

public interface Assistant {
    Flux<String> chatFlux(String message);
}

StreamController

@RestController
@RequestMapping("stream")
@Slf4j
public class StreamController {
    @Resource
    private Assistant assistant;

    @GetMapping("/qwen/chat3")
    public Flux<String> chat3(@RequestParam(value = "question", defaultValue = "你是谁？") String question) {
        return assistant.chatFlux(question);
    }
}