第二章:大模型 API 接入实战
2.1 OpenAI API 完整指南
获取 API Key
- 访问 platform.openai.com
- 注册/登录账号
- 进入 API Keys 页面
- 点击 "Create new secret key"
基础调用
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是一个有用的助手。"},
{"role": "user", "content": "你好!"}
]
)
print(response.choices[0].message.content)
TypeScript 版本
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "你是一个有用的助手。" },
{ role: "user", content: "你好!" }
],
});
console.log(response.choices[0].message.content);
模型选择建议
| 模型 | 适用场景 | 价格 (输入/输出) |
|---|
| gpt-4o | 复杂推理、Agent | 2.5/10 每 1M tokens |
| gpt-4o-mini | 快速响应、简单任务 | 0.15/0.6 每 1M tokens |
| o1 | 深度推理 | 15/60 每 1M tokens |
| o1-mini | 数学/代码推理 | 1.5/6 每 1M tokens |
2.2 国内大模型接入
智谱 GLM
from zhipuai import ZhipuAI
client = ZhipuAI(api_key="your-api-key")
response = client.chat.completions.create(
model="glm-4",
messages=[
{"role": "user", "content": "你好!"}
]
)
OpenAI 兼容方式:
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://open.bigmodel.cn/api/paas/v4/"
)
response = client.chat.completions.create(
model="glm-4",
messages=[{"role": "user", "content": "你好!"}]
)
通义千问 (Qwen)
from openai import OpenAI
client = OpenAI(
api_key="your-dashscope-api-key",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-plus",
messages=[{"role": "user", "content": "你好!"}]
)
Kimi (Moonshot)
from openai import OpenAI
client = OpenAI(
api_key="your-moonshot-api-key",
base_url="https://api.moonshot.cn/v1"
)
response = client.chat.completions.create(
model="moonshot-v1-8k",
messages=[{"role": "user", "content": "你好!"}]
)
模型对比
| 模型 | 上下文 | 价格 | 特点 |
|---|
| GLM-4 | 128K | ¥0.1/千tokens | 国产最强 |
| Qwen-Plus | 32K | ¥0.004/千tokens | 性价比高 |
| Kimi | 200K | ¥0.012/千tokens | 超长上下文 |
2.3 流式输出处理
为什么需要流式输出?
- 用户体验:实时显示,无需等待
- 节省时间:边生成边处理
- Token 节省:可提前终止
Python 流式输出
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "写一首诗"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
TypeScript 流式输出
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "写一首诗" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
前端流式显示 (Next.js API Route)
import OpenAI from "openai";
import { OpenAIStream, StreamingTextResponse } from "ai";
const client = new OpenAI();
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await client.chat.completions.create({
model: "gpt-4o",
messages,
stream: true,
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}
2.4 错误处理与重试策略
常见错误类型
| 错误码 | 含义 | 处理方式 |
|---|
| 400 | 请求格式错误 | 检查参数 |
| 401 | 认证失败 | 检查 API Key |
| 429 | 速率限制 | 指数退避重试 |
| 500 | 服务器错误 | 重试 |
| 503 | 服务不可用 | 等待后重试 |
重试策略实现
import time
from openai import OpenAI, RateLimitError, APIError
def call_with_retry(client, messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-4o",
messages=messages
)
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited, waiting {wait_time}s...")
time.sleep(wait_time)
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(1)
raise Exception("Max retries exceeded")
TypeScript 版本
async function callWithRetry(
client: OpenAI,
messages: any[],
maxRetries = 3
): Promise<any> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await client.chat.completions.create({
model: "gpt-4o",
messages,
});
} catch (error: any) {
if (error.status === 429) {
const waitTime = Math.pow(2, attempt) * 1000;
console.log(`Rate limited, waiting ${waitTime}ms...`);
await new Promise((r) => setTimeout(r, waitTime));
} else if (attempt === maxRetries - 1) {
throw error;
}
}
}
}
2.5 成本优化技巧
1. 选择合适的模型
简单任务 → gpt-4o-mini / qwen-turbo
复杂推理 → gpt-4o / glm-4
超长文本 → kimi / claude
2. 缓存常见请求
from functools import lru_cache
@lru_cache(maxsize=100)
def cached_response(prompt_hash: str):
pass
3. Token 计算
import tiktoken
def count_tokens(text: str, model: str = "gpt-4o"):
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
4. 批量处理
batch_messages = [
[{"role": "user", "content": msg}]
for msg in messages
]
import asyncio
async def process_batch(client, batch):
tasks = [
client.chat.completions.create(model="gpt-4o", messages=msg)
for msg in batch
]
return await asyncio.gather(*tasks)
2.6 实战:封装一个 LLM 客户端
from openai import OpenAI
from typing import List, Dict, Optional
import time
class LLMClient:
"""统一的大模型客户端封装"""
def __init__(
self,
api_key: str,
base_url: Optional[str] = None,
model: str = "gpt-4o"
):
self.client = OpenAI(api_key=api_key, base_url=base_url)
self.model = model
def chat(
self,
message: str,
system_prompt: Optional[str] = None,
stream: bool = False
) -> str:
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": message})
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
stream=stream
)
if stream:
return self._handle_stream(response)
return response.choices[0].message.content
def _handle_stream(self, stream):
full_content = ""
for chunk in stream:
if chunk.choices[0].delta.content:
content = chunk.choices[0].delta.content
print(content, end="", flush=True)
full_content += content
return full_content
client = LLMClient(
api_key="your-key",
base_url="https://open.bigmodel.cn/api/paas/v4/",
model="glm-4"
)
response = client.chat("你好!", system_prompt="你是一个友好的助手")
2.7 本节小结
- OpenAI API 是行业标准,国内模型大多兼容
- 流式输出提升用户体验
- 必须实现重试机制应对 API 不稳定
- 成本优化:选对模型 + 缓存 + Token 计算
下一章预告
第三章将讲解 Prompt 工程与系统提示,包括:
- Prompt 设计原则
- 系统提示最佳实践
- Few-shot 与 Chain-of-Thought
- Agent 专用的 Prompt 模板