国产大模型的崛起让开发者有了更多选择。DeepSeek、通义千问等主流模型都提供了标准的 Chat Completions 接口格式,这意味着你只需要会一种调用方式,就能无缝切换不同的模型服务。本文用三种语言分别演示完整的接入实践,包括流式输出、错误处理和生产级配置建议。
接口格式通用性
目前主流国产模型均支持与 Chat Completions 兼容的 REST API,请求体结构基本相同:
{
"model": "deepseek-chat",
"messages": [
{ "role": "user", "content": "你好" }
],
"stream": false
}
不同模型只需切换 base_url 和 model 名称,鉴权方式也统一用 Authorization: Bearer <api_key> Header。
| 服务 | base_url | 模型名示例 |
|---|---|---|
| DeepSeek | https://api.deepseek.com/v1 | deepseek-chat、deepseek-reasoner |
| 通义千问(Qwen) | https://dashscope.aliyuncs.com/compatible-mode/v1 | qwen-max、qwen-plus |
| 智谱 GLM | https://open.bigmodel.cn/api/paas/v4 | glm-4-flash、glm-4 |
| MiniMax | https://api.minimax.chat/v1 | abab6.5s-chat |
Python 篇
安装依赖
pip install openai python-dotenv
openai 这个库本质上是个封装了 HTTP 请求的客户端,很多国产模型都兼容这个请求格式,直接复用即可。
环境变量配置
项目根目录创建 .env 文件:
DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
QWEN_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
基本对话
import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com/v1",
)
def chat(message: str) -> str:
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": message}],
max_tokens=1024,
)
return response.choices[0].message.content
print(chat("用一句话解释什么是向量数据库"))
流式输出
def stream_chat(message: str):
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": message}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print() # 换行
stream_chat("写一首关于深度学习的五言绝句")
错误处理与重试(指数退避)
生产环境中,429 Too Many Requests(限流)和 5xx 服务端错误需要区别对待:
- 429:服务器在限速,应该退避等待后重试,不要立即重试
- 500/502/503:服务端故障,短暂重试有意义,但超过阈值后应快速失败,避免请求堆积
import time
import random
import logging
from openai import OpenAI, RateLimitError, APIStatusError, APIConnectionError
logger = logging.getLogger(__name__)
def chat_with_retry(
client: OpenAI,
messages: list,
model: str = "deepseek-chat",
max_retries: int = 4,
base_delay: float = 1.0,
) -> str:
"""
带指数退避的请求函数。
429 限流:退避时间较长(抖动 + 指数增长)。
5xx 服务端错误:退避时间较短,快速重试。
"""
for attempt in range(max_retries + 1):
try:
response = client.chat.completions.create(
model=model,
messages=messages,
timeout=30,
)
return response.choices[0].message.content
except RateLimitError as e:
if attempt == max_retries:
raise
# 限流时等待时间更长,加入随机抖动避免惊群效应
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
logger.warning(f"限流 (429),{delay:.1f}s 后重试(第 {attempt + 1} 次)")
time.sleep(delay)
except APIStatusError as e:
if e.status_code < 500 or attempt == max_retries:
# 4xx 客户端错误(除429外)不重试;重试次数耗尽也不再重试
raise
delay = base_delay * (2 ** attempt)
logger.warning(f"服务端错误 ({e.status_code}),{delay:.1f}s 后重试")
time.sleep(delay)
except APIConnectionError as e:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt)
logger.warning(f"连接错误,{delay:.1f}s 后重试")
time.sleep(delay)
切换到 Qwen
只需换两行:
client = OpenAI(
api_key=os.getenv("QWEN_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
response = client.chat.completions.create(
model="qwen-max",
messages=[{"role": "user", "content": "你好"}],
)
Node.js 篇
安装依赖
npm install openai dotenv
环境变量
# .env
DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
基本对话(TypeScript)
import OpenAI from "openai";
import * as dotenv from "dotenv";
dotenv.config();
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY!,
baseURL: "https://api.deepseek.com/v1",
});
async function chat(message: string): Promise<string> {
const response = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: message }],
max_tokens: 1024,
});
return response.choices[0].message.content ?? "";
}
(async () => {
const result = await chat("解释一下 Transformer 架构的核心思想");
console.log(result);
})();
流式输出
async function streamChat(message: string): Promise<void> {
const stream = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: message }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
console.log(); // 换行
}
错误处理
import OpenAI, { APIError } from "openai";
async function chatWithRetry(
message: string,
maxRetries = 4,
baseDelayMs = 1000
): Promise<string> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: message }],
timeout: 30000,
});
return response.choices[0].message.content ?? "";
} catch (err) {
if (!(err instanceof APIError)) throw err;
const isLastAttempt = attempt === maxRetries;
if (err.status === 429) {
if (isLastAttempt) throw err;
const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000;
console.warn(`限流 (429),${(delay / 1000).toFixed(1)}s 后重试`);
await sleep(delay);
} else if (err.status >= 500) {
if (isLastAttempt) throw err;
const delay = baseDelayMs * Math.pow(2, attempt);
console.warn(`服务端错误 (${err.status}),${delay / 1000}s 后重试`);
await sleep(delay);
} else {
throw err; // 4xx 客户端错误,直接抛出
}
}
}
throw new Error("不可达");
}
const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
超时配置建议
const client = new OpenAI({
apiKey: process.env.DEEPSEEK_API_KEY!,
baseURL: "https://api.deepseek.com/v1",
timeout: 60 * 1000, // 60s 全局超时(非流式)
maxRetries: 0, // 关闭库内置重试,交由业务层控制
});
Go 篇
安装依赖
go get github.com/sashabaranov/go-openai
基本对话
package main
import (
"context"
"fmt"
"os"
openai "github.com/sashabaranov/go-openai"
)
func main() {
cfg := openai.DefaultConfig(os.Getenv("DEEPSEEK_API_KEY"))
cfg.BaseURL = "https://api.deepseek.com/v1"
client := openai.NewClientWithConfig(cfg)
resp, err := client.CreateChatCompletion(
context.Background(),
openai.ChatCompletionRequest{
Model: "deepseek-chat",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "用 Go 写一个并发安全的计数器"},
},
},
)
if err != nil {
fmt.Fprintf(os.Stderr, "API 请求失败: %v\n", err)
os.Exit(1)
}
fmt.Println(resp.Choices[0].Message.Content)
}
流式输出
func streamChat(client *openai.Client, message string) error {
stream, err := client.CreateChatCompletionStream(
context.Background(),
openai.ChatCompletionRequest{
Model: "deepseek-chat",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: message},
},
Stream: true,
},
)
if err != nil {
return fmt.Errorf("创建流失败: %w", err)
}
defer stream.Close()
for {
resp, err := stream.Recv()
if errors.Is(err, io.EOF) {
fmt.Println()
return nil
}
if err != nil {
return fmt.Errorf("读取流错误: %w", err)
}
if len(resp.Choices) > 0 {
fmt.Print(resp.Choices[0].Delta.Content)
}
}
}
错误处理与重试
package main
import (
"context"
"errors"
"math"
"math/rand"
"net/http"
"time"
openai "github.com/sashabaranov/go-openai"
)
func chatWithRetry(
ctx context.Context,
client *openai.Client,
messages []openai.ChatCompletionMessage,
maxRetries int,
) (string, error) {
baseDelay := time.Second
for attempt := 0; attempt <= maxRetries; attempt++ {
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "deepseek-chat",
Messages: messages,
})
if err == nil {
return resp.Choices[0].Message.Content, nil
}
var apiErr *openai.APIError
if !errors.As(err, &apiErr) {
return "", err // 非 API 错误,直接返回
}
isLast := attempt == maxRetries
switch {
case apiErr.HTTPStatusCode == http.StatusTooManyRequests:
if isLast {
return "", err
}
// 限流:指数退避 + 随机抖动
jitter := time.Duration(rand.Int63n(int64(time.Second)))
delay := time.Duration(math.Pow(2, float64(attempt)))*baseDelay + jitter
time.Sleep(delay)
case apiErr.HTTPStatusCode >= 500:
if isLast {
return "", err
}
delay := time.Duration(math.Pow(2, float64(attempt))) * baseDelay
time.Sleep(delay)
default:
return "", err // 4xx 不重试
}
}
return "", errors.New("重试次数耗尽")
}
直接用 net/http 调用(无三方依赖)
有时候项目不想引入额外依赖,用标准库直接发 HTTP 请求也很干净:
func chatHTTP(apiKey, message string) (string, error) {
body := map[string]any{
"model": "deepseek-chat",
"messages": []map[string]string{
{"role": "user", "content": message},
},
}
data, _ := json.Marshal(body)
req, _ := http.NewRequest("POST",
"https://api.deepseek.com/v1/chat/completions",
bytes.NewReader(data),
)
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
client := &http.Client{Timeout: 60 * time.Second}
resp, err := client.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
var result struct {
Choices []struct {
Message struct{ Content string } `json:"message"`
} `json:"choices"`
}
json.NewDecoder(resp.Body).Decode(&result)
return result.Choices[0].Message.Content, nil
}
横向对比:三语言接入成本
| 维度 | Python | Node.js | Go |
|---|---|---|---|
| 上手速度 | 最快 | 快 | 中等 |
| 流式处理 | 简洁 | 简洁 | 稍繁琐 |
| 部署体积 | 大(含运行时) | 中 | 小(单二进制) |
| 并发能力 | 弱(GIL) | 中(事件循环) | 强(goroutine) |
| 适合场景 | 脚本/AI 项目 | Web 服务 | 高并发网关 |
最佳实践总结
-
环境变量管理:API Key 绝对不能硬编码,用
.env+ gitignore,CI 用 Secret Manager。 -
超时配置:非流式请求建议 30-60s;流式请求因为数据是分批返回的,设置连接超时(10s)而非总超时。
-
重试策略:
- 429 → 退避时间长(4s, 8s, 16s),加随机抖动
- 5xx → 退避时间短(1s, 2s, 4s),超过 3 次快速失败
- 4xx(非429)→ 不重试,直接报错给业务层
-
流式输出:用户体验好很多,优先使用,但要注意流中断的异常处理(断流时要告知用户)。
-
多模型切换:把
base_url和model提取为配置,业务代码无感切换不同服务商。这个思路往前推一步,就是统一网关的设计——笔者开发的 TheRouter 就采用了这种架构:一个入口、一个 API Key,就能访问 DeepSeek、Qwen、Claude 等所有主流模型,应用层代码完全不用感知底层路由。
三种语言的完整代码已经覆盖了生产环境的核心需求。Python 适合快速原型和 AI 项目,Node.js 适合 Web 服务集成,Go 适合构建高并发的 API 网关层。根据你的技术栈选择对应的方案即可。
作者:TheRouter 开发者,专注 AI 模型路由网关。项目主页:therouter.ai