Python/Node.js/Go 三语言调用大模型 API 最佳实践

29 阅读6分钟

国产大模型的崛起让开发者有了更多选择。DeepSeek、通义千问等主流模型都提供了标准的 Chat Completions 接口格式,这意味着你只需要会一种调用方式,就能无缝切换不同的模型服务。本文用三种语言分别演示完整的接入实践,包括流式输出、错误处理和生产级配置建议。


接口格式通用性

目前主流国产模型均支持与 Chat Completions 兼容的 REST API,请求体结构基本相同:

{
  "model": "deepseek-chat",
  "messages": [
    { "role": "user", "content": "你好" }
  ],
  "stream": false
}

不同模型只需切换 base_urlmodel 名称,鉴权方式也统一用 Authorization: Bearer <api_key> Header。

服务base_url模型名示例
DeepSeekhttps://api.deepseek.com/v1deepseek-chatdeepseek-reasoner
通义千问(Qwen)https://dashscope.aliyuncs.com/compatible-mode/v1qwen-maxqwen-plus
智谱 GLMhttps://open.bigmodel.cn/api/paas/v4glm-4-flashglm-4
MiniMaxhttps://api.minimax.chat/v1abab6.5s-chat

Python 篇

安装依赖

pip install openai python-dotenv

openai 这个库本质上是个封装了 HTTP 请求的客户端,很多国产模型都兼容这个请求格式,直接复用即可。

环境变量配置

项目根目录创建 .env 文件:

DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx
QWEN_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

基本对话

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI(
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url="https://api.deepseek.com/v1",
)

def chat(message: str) -> str:
    response = client.chat.completions.create(
        model="deepseek-chat",
        messages=[{"role": "user", "content": message}],
        max_tokens=1024,
    )
    return response.choices[0].message.content

print(chat("用一句话解释什么是向量数据库"))

流式输出

def stream_chat(message: str):
    stream = client.chat.completions.create(
        model="deepseek-chat",
        messages=[{"role": "user", "content": message}],
        stream=True,
    )
    for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content:
            print(delta.content, end="", flush=True)
    print()  # 换行

stream_chat("写一首关于深度学习的五言绝句")

错误处理与重试(指数退避)

生产环境中,429 Too Many Requests(限流)和 5xx 服务端错误需要区别对待:

  • 429:服务器在限速,应该退避等待后重试,不要立即重试
  • 500/502/503:服务端故障,短暂重试有意义,但超过阈值后应快速失败,避免请求堆积
import time
import random
import logging
from openai import OpenAI, RateLimitError, APIStatusError, APIConnectionError

logger = logging.getLogger(__name__)

def chat_with_retry(
    client: OpenAI,
    messages: list,
    model: str = "deepseek-chat",
    max_retries: int = 4,
    base_delay: float = 1.0,
) -> str:
    """
    带指数退避的请求函数。
    429 限流:退避时间较长(抖动 + 指数增长)。
    5xx 服务端错误:退避时间较短,快速重试。
    """
    for attempt in range(max_retries + 1):
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
                timeout=30,
            )
            return response.choices[0].message.content

        except RateLimitError as e:
            if attempt == max_retries:
                raise
            # 限流时等待时间更长,加入随机抖动避免惊群效应
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            logger.warning(f"限流 (429),{delay:.1f}s 后重试(第 {attempt + 1} 次)")
            time.sleep(delay)

        except APIStatusError as e:
            if e.status_code < 500 or attempt == max_retries:
                # 4xx 客户端错误(除429外)不重试;重试次数耗尽也不再重试
                raise
            delay = base_delay * (2 ** attempt)
            logger.warning(f"服务端错误 ({e.status_code}),{delay:.1f}s 后重试")
            time.sleep(delay)

        except APIConnectionError as e:
            if attempt == max_retries:
                raise
            delay = base_delay * (2 ** attempt)
            logger.warning(f"连接错误,{delay:.1f}s 后重试")
            time.sleep(delay)

切换到 Qwen

只需换两行:

client = OpenAI(
    api_key=os.getenv("QWEN_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
response = client.chat.completions.create(
    model="qwen-max",
    messages=[{"role": "user", "content": "你好"}],
)

Node.js 篇

安装依赖

npm install openai dotenv

环境变量

# .env
DEEPSEEK_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxx

基本对话(TypeScript)

import OpenAI from "openai";
import * as dotenv from "dotenv";

dotenv.config();

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY!,
  baseURL: "https://api.deepseek.com/v1",
});

async function chat(message: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "deepseek-chat",
    messages: [{ role: "user", content: message }],
    max_tokens: 1024,
  });
  return response.choices[0].message.content ?? "";
}

(async () => {
  const result = await chat("解释一下 Transformer 架构的核心思想");
  console.log(result);
})();

流式输出

async function streamChat(message: string): Promise<void> {
  const stream = await client.chat.completions.create({
    model: "deepseek-chat",
    messages: [{ role: "user", content: message }],
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      process.stdout.write(content);
    }
  }
  console.log(); // 换行
}

错误处理

import OpenAI, { APIError } from "openai";

async function chatWithRetry(
  message: string,
  maxRetries = 4,
  baseDelayMs = 1000
): Promise<string> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await client.chat.completions.create({
        model: "deepseek-chat",
        messages: [{ role: "user", content: message }],
        timeout: 30000,
      });
      return response.choices[0].message.content ?? "";

    } catch (err) {
      if (!(err instanceof APIError)) throw err;

      const isLastAttempt = attempt === maxRetries;

      if (err.status === 429) {
        if (isLastAttempt) throw err;
        const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000;
        console.warn(`限流 (429),${(delay / 1000).toFixed(1)}s 后重试`);
        await sleep(delay);

      } else if (err.status >= 500) {
        if (isLastAttempt) throw err;
        const delay = baseDelayMs * Math.pow(2, attempt);
        console.warn(`服务端错误 (${err.status}),${delay / 1000}s 后重试`);
        await sleep(delay);

      } else {
        throw err; // 4xx 客户端错误,直接抛出
      }
    }
  }
  throw new Error("不可达");
}

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

超时配置建议

const client = new OpenAI({
  apiKey: process.env.DEEPSEEK_API_KEY!,
  baseURL: "https://api.deepseek.com/v1",
  timeout: 60 * 1000,     // 60s 全局超时(非流式)
  maxRetries: 0,           // 关闭库内置重试,交由业务层控制
});

Go 篇

安装依赖

go get github.com/sashabaranov/go-openai

基本对话

package main

import (
    "context"
    "fmt"
    "os"

    openai "github.com/sashabaranov/go-openai"
)

func main() {
    cfg := openai.DefaultConfig(os.Getenv("DEEPSEEK_API_KEY"))
    cfg.BaseURL = "https://api.deepseek.com/v1"
    client := openai.NewClientWithConfig(cfg)

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "deepseek-chat",
            Messages: []openai.ChatCompletionMessage{
                {Role: openai.ChatMessageRoleUser, Content: "用 Go 写一个并发安全的计数器"},
            },
        },
    )
    if err != nil {
        fmt.Fprintf(os.Stderr, "API 请求失败: %v\n", err)
        os.Exit(1)
    }
    fmt.Println(resp.Choices[0].Message.Content)
}

流式输出

func streamChat(client *openai.Client, message string) error {
    stream, err := client.CreateChatCompletionStream(
        context.Background(),
        openai.ChatCompletionRequest{
            Model:  "deepseek-chat",
            Messages: []openai.ChatCompletionMessage{
                {Role: openai.ChatMessageRoleUser, Content: message},
            },
            Stream: true,
        },
    )
    if err != nil {
        return fmt.Errorf("创建流失败: %w", err)
    }
    defer stream.Close()

    for {
        resp, err := stream.Recv()
        if errors.Is(err, io.EOF) {
            fmt.Println()
            return nil
        }
        if err != nil {
            return fmt.Errorf("读取流错误: %w", err)
        }
        if len(resp.Choices) > 0 {
            fmt.Print(resp.Choices[0].Delta.Content)
        }
    }
}

错误处理与重试

package main

import (
    "context"
    "errors"
    "math"
    "math/rand"
    "net/http"
    "time"

    openai "github.com/sashabaranov/go-openai"
)

func chatWithRetry(
    ctx context.Context,
    client *openai.Client,
    messages []openai.ChatCompletionMessage,
    maxRetries int,
) (string, error) {
    baseDelay := time.Second

    for attempt := 0; attempt <= maxRetries; attempt++ {
        resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
            Model:    "deepseek-chat",
            Messages: messages,
        })
        if err == nil {
            return resp.Choices[0].Message.Content, nil
        }

        var apiErr *openai.APIError
        if !errors.As(err, &apiErr) {
            return "", err // 非 API 错误,直接返回
        }

        isLast := attempt == maxRetries

        switch {
        case apiErr.HTTPStatusCode == http.StatusTooManyRequests:
            if isLast {
                return "", err
            }
            // 限流:指数退避 + 随机抖动
            jitter := time.Duration(rand.Int63n(int64(time.Second)))
            delay := time.Duration(math.Pow(2, float64(attempt)))*baseDelay + jitter
            time.Sleep(delay)

        case apiErr.HTTPStatusCode >= 500:
            if isLast {
                return "", err
            }
            delay := time.Duration(math.Pow(2, float64(attempt))) * baseDelay
            time.Sleep(delay)

        default:
            return "", err // 4xx 不重试
        }
    }
    return "", errors.New("重试次数耗尽")
}

直接用 net/http 调用(无三方依赖)

有时候项目不想引入额外依赖,用标准库直接发 HTTP 请求也很干净:

func chatHTTP(apiKey, message string) (string, error) {
    body := map[string]any{
        "model":  "deepseek-chat",
        "messages": []map[string]string{
            {"role": "user", "content": message},
        },
    }
    data, _ := json.Marshal(body)

    req, _ := http.NewRequest("POST",
        "https://api.deepseek.com/v1/chat/completions",
        bytes.NewReader(data),
    )
    req.Header.Set("Authorization", "Bearer "+apiKey)
    req.Header.Set("Content-Type", "application/json")

    client := &http.Client{Timeout: 60 * time.Second}
    resp, err := client.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()

    var result struct {
        Choices []struct {
            Message struct{ Content string } `json:"message"`
        } `json:"choices"`
    }
    json.NewDecoder(resp.Body).Decode(&result)
    return result.Choices[0].Message.Content, nil
}

横向对比:三语言接入成本

维度PythonNode.jsGo
上手速度最快中等
流式处理简洁简洁稍繁琐
部署体积大(含运行时)小(单二进制)
并发能力弱(GIL)中(事件循环)强(goroutine)
适合场景脚本/AI 项目Web 服务高并发网关

最佳实践总结

  1. 环境变量管理:API Key 绝对不能硬编码,用 .env + gitignore,CI 用 Secret Manager。

  2. 超时配置:非流式请求建议 30-60s;流式请求因为数据是分批返回的,设置连接超时(10s)而非总超时。

  3. 重试策略

    • 429 → 退避时间长(4s, 8s, 16s),加随机抖动
    • 5xx → 退避时间短(1s, 2s, 4s),超过 3 次快速失败
    • 4xx(非429)→ 不重试,直接报错给业务层
  4. 流式输出:用户体验好很多,优先使用,但要注意流中断的异常处理(断流时要告知用户)。

  5. 多模型切换:把 base_urlmodel 提取为配置,业务代码无感切换不同服务商。这个思路往前推一步,就是统一网关的设计——笔者开发的 TheRouter 就采用了这种架构:一个入口、一个 API Key,就能访问 DeepSeek、Qwen、Claude 等所有主流模型,应用层代码完全不用感知底层路由。


三种语言的完整代码已经覆盖了生产环境的核心需求。Python 适合快速原型和 AI 项目,Node.js 适合 Web 服务集成,Go 适合构建高并发的 API 网关层。根据你的技术栈选择对应的方案即可。


作者:TheRouter 开发者,专注 AI 模型路由网关。项目主页:therouter.ai