大模型接口调用实践仅仅与大模型进行简单的对话，显然无法充分发挥我们程序员的专长。接下来，我们将通过调用大模型接口，将我们

前置知识

仅仅与大模型进行简单的对话，显然无法充分发挥我们程序员的专长。接下来，我们将通过调用大模型接口，将我们的编程技艺与大模型的智能相结合，为我们未来的应用开发注入智能的力量。

大模型接口

在与大模型接口对接时，我们有三种方式可以选择：

API

API调用是最直接也是最常用的方式，通过统一的HTTP形式，提供大模型能力，我们通过网络请求框架直接请求即可。因为现在API标准HTTP接口，几乎没有什么学习曲线，为数不多需要注意的是数据传输格式和鉴权方式。

SDK

SDK（Software Development Kit）提供了一套封装好的工具和库，让我们能够更加方便地集成大模型的功能。由于SDK会针对每种语言提供，一般会根据各个厂商的技术选择，有所侧重，同时更新频率也会比API落后一些。

仿Open AI

众所周知，Open AI是这波大模型潮流当之无愧的引领者，所以，Open AI最初提出的接口规范也成为一种事实的标准。有些厂商是通过API方式提供，有些厂商是通过SDK方式提供。

具体如何选择，需要根据具体项目情况，这里给出一些我的理解，供大家参考：

如果是纯python项目，SDK或者仿Open AI的SDK都很好用，代码非常简洁；
如果是前端直接调用，API方式比较好，因为大部分厂商一般未提供前端SDK，需要注意保证密钥的安全性；
如果是后端Java调用，建议API，因为目前大部分都是流式调用，Java处理SDK代码比较复杂，不如直接使用Flux将流式响应传递给前端，由前端处理响应的流式逻辑。

接口服务商

由于个人精力有限，仅表达直观使用感受，供大家参考。

在这个领域，我目前使用较多的是以下三家（排名不分先后）：

扣子

字节跳动旗下的大模型开发平台，主要特点是：

Agent智能体生态很活跃，常用的插件、工作流很丰富，尤其是图像流用法，极大的丰富了智能体的模态。
智能体的输出很方便，不论自有体系的豆包、抖音、飞书等，还是外部的微信、企业微信，都可以非常便捷的对接服务。
可视化操作体验非常舒服，尤其是工作流、图像流鼠标滚轮的使用习惯，感觉非常符合本能，上手很快。

就是免费额度有点少，当然，可以通过各种活动领取或者购买。

智谱清言

清华大学体系，应该是国内开源大模型的先锋，主要特点：

glm-4-flash，国内首个免费的大模型，不用购买，没有额度限制，个人应用使用起来非常Nice。
语言、图像、视频、代码各类模型非常丰富，完全可以满足我们面对的各种需求。
接口对接上提供上述API、SDK、Open AI、Langchain多种方式，对程序员非常友好。

书生

上海人工智能实验室旗下大模型，主要特点：

接口、RAG、微调、部署、测评，各环节所需工具都有方案，全链条生态。
实训营、兴趣小组、开源项目，社区方面非常完备且活跃。
语言模型可免费使用，只是有限额，不过个人应用，估计也不会超出。

具体实现

本次分享我们采用智谱清言大模型接口，都直接采用的API方式调用。

参考资料：

JS非流式调用

非流式调用比较简单，不好的一点就是响应时间比较长，因为要等到整个模型生成完成后才会返回。

注意：

fetch方法是专门支持流式返回的，所以直接设置url即可，如果使用sse，需要分为请求接口，监听事件返回2部分。
fetch方法中data属性，必须使用JSON.stringify。

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>JS 非流式调用 Demo</title>
</head>
<body>
    <h1>JS 非流式调用 Demo</h1>
    <textarea id="inputText" placeholder="输入你的文本..." rows="3"></textarea>
    <br/>
    <button id="submitBtn">提交</button>


    <script>
        document.getElementById('submitBtn').addEventListener('click', async () => {
            const inputText = document.getElementById('inputText').value;

            try {
                const response = await fetch('https://open.bigmodel.cn/api/paas/v4/chat/completions', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': 'Bearer <APIKey>' // 替换为你的 API 密钥
                    },
                    body: JSON.stringify({
                        'model': 'glm-4-flash',
                        'messages': [
                            {
                                "role": "user", "content": inputText
                            },
                        ]
                    })
                });

                if (!response.ok) {
                    throw new Error('网络响应不是 OK');
                }

                const data = await response.json();
                console.log(data)
            } catch (error) {
                console.error(error)
            }
        });
    </script>
</body>
</html>

JS流式调用

流式调用会像打字机一样逐字展示，因为大模型有第一个生成内容就返回了，所以用户体验比较良好，不需要等待太久，就会有结果返回。

注意：

fetch方法是专门支持流式返回的，所以直接设置url即可，如果使用sse，需要分为请求接口，监听事件返回2部分。
fetch方法中data属性，必须使用JSON.stringify。
流式响应的处理逻辑比较特殊，使用Reader循环进行读取。

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>JS 流式调用 Demo</title>
</head>
<body>
    <h1>JS 流式调用 Demo</h1>
    <textarea id="inputText" placeholder="输入你的文本..." rows="3"></textarea>
    <br/>
    <button id="submitBtn">提交</button>


    <script>
        document.getElementById('submitBtn').addEventListener('click', async () => {
            const inputText = document.getElementById('inputText').value;
            const responseOutput = document.getElementById('responseOutput');

            try {
                const response = await fetch('https://open.bigmodel.cn/api/paas/v4/chat/completions', {
                    method: 'POST',
                    headers: {
                        'Content-Type': 'application/json',
                        'Authorization': 'Bearer <APIKey>' // 替换为你的 API 密钥
                    },
                    body: JSON.stringify({
                        'model': 'glm-4-flash',
                        'messages': [
                            {
                                "role": "user", "content": inputText
                            },
                        ],
                        'stream': true
                    })
                });

                if (!response.ok) {
                    throw new Error('网络响应不是 OK');
                }

                const reader = response.body.getReader();
                const decoder = new TextDecoder("utf-8");
                let done, value;

                while ({ done, value } = await reader.read(), !done) {
                    let chunk = decoder.decode(value, { stream: true });
                    console.log(chunk)
                                                         
                }
            } catch (error) {
                console.error(error)
            }
        });
    </script>
</body>
</html>

Spring Boot流式调用

为了APIKey的安全，最保险的方式是由后端进行调用发起，并且也方便进行权限控制等处理。

Spring Boot中使用Flux进行大模型接口的参数构造、请求、流式转发即可完成，核心代码如下：

后端代码：

package com.example.webfluxdemo.controller;

import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.reactive.function.client.WebClient;
import reactor.core.publisher.Flux;

import java.time.Duration;

@RestController
public class ChatController {

    private final WebClient webClient;

    public ChatController(WebClient.Builder webClientBuilder) {
        this.webClient = webClientBuilder.baseUrl("https://open.bigmodel.cn").build(); 
    }

    @PostMapping(value = "/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(String message) {
        String body = "{\"model\": \"glm-4-flash\",\"messages\": [{\"role\": \"user\",\"content\": \""+message+"\"}],\"stream\": true}";
        return webClient.post()
                .uri("/api/paas/v4/chat/completions")
                .header("Content-Type", "application/json")
                .header("Authorization", "Bearer <APIKey>") // 替换为你的 API 密钥
                .bodyValue(body)
                .retrieve()
                .bodyToFlux(String.class);
    }
}

前端代码：

<!DOCTYPE html>
<html lang="zh">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Java 流式调用 Demo</title>
</head>
<body>
    <h1>Java 流式调用 Demo</h1>
    <textarea id="inputText" placeholder="输入你的文本..." rows="3"></textarea>
    <br/>
    <button id="submitBtn">提交</button>

    <script>
        document.getElementById('submitBtn').addEventListener('click', async () => {
            const inputText = document.getElementById('inputText').value;

            try {
                const response = await fetch('http://localhost:8080/chat?message='+inputText, {
                    method: 'POST',
                });

                if (!response.ok) {
                    throw new Error('网络响应不是 OK');
                }

                const reader = response.body.getReader();
                const decoder = new TextDecoder("utf-8");
                let done, value;

                while ({ done, value } = await reader.read(), !done) {
                    let chunk = decoder.decode(value, { stream: true });
                    console.log(chunk)
                                       
                }
            } catch (error) {
                console.error(error)
            }
        });
    </script>
</body>
</html>

希望这篇文章能够帮助大家更好地理解和使用大模型API。

如果你有任何问题，欢迎在评论区告诉我。