上一篇文章介绍了 OpenMAIC 总体的架构分析,这节我们讲解他是怎么产出课程内容,产出课程内容的 Agent 怎么设计的,做了哪些功能。正在学习 Agent 开发的同学可以额外关注一下。
本文档详细讲解 OpenMAIC 课程生成的三阶段流水线架构,包括每个阶段的输入输出、Prompt 构建过程、以及 LLM 调用的具体参数。
概览
用户输入
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 1: 需求 → 场景大纲 │
│ API: POST /api/generate/scene-outlines-stream │
│ 特点: SSE 流式返回,增量解析 JSON │
└─────────────────────────────────────────────────────────────┘
│
▼ (每个大纲独立调用)
┌─────────────────────────────────────────────────────────────┐
│ Stage 2: 大纲 → 场景内容 │
│ API: POST /api/generate/scene-content │
│ 特点: 根据 type 分发到不同生成器 │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Stage 3: 内容 → 教学动作 │
│ API: POST /api/generate/scene-actions │
│ 特点: 结合上下文生成讲解序列 │
└─────────────────────────────────────────────────────────────┘
│
▼
完整课堂 (Stage)
Stage 1: 需求 → 场景大纲
1.1 输入数据
// 前端发送的请求体
interface OutlineRequest {
requirements: {
requirement: string; // 用户自然语言需求
language: 'zh-CN' | 'en-US';
userNickname?: string; // 学生昵称
userBio?: string; // 学生背景
webSearch?: boolean; // 是否启用网络搜索
};
pdfText?: string; // PDF 提取的文本(已截断)
pdfImages?: Array<{ // PDF 提取的图片列表
id: string;
description: string;
width?: number;
height?: number;
}>;
imageMapping?: Record<string, string>; // 图片ID → base64 URL
researchContext?: string; // 网络搜索结果
agents?: Array<{ // 教师 Agent 信息
id: string;
name: string;
persona: string;
role: 'teacher' | 'student';
}>;
}
示例请求:
const request = {
requirements: {
requirement: '帮我创建一个关于光合作用的初中生物课程,时长20分钟',
language: 'zh-CN',
webSearch: true,
},
pdfText: '光合作用是指绿色植物通过叶绿素...',
pdfImages: [
{ id: 'img_1', description: '光合作用示意图', width: 800, height: 600 },
],
imageMapping: {
'img_1': 'data:image/png;base64,iVBORw0KGgo...',
},
researchContext: '2024年最新研究表明...',
agents: [
{ id: 'teacher_1', name: '李老师', persona: '亲切的初中生物老师', role: 'teacher' },
],
};
1.2 Prompt 构建
模板文件:
lib/generation/prompts/templates/requirements-to-outlines/system.mdlib/generation/prompts/templates/requirements-to-outlines/user.md
变量插值过程:
// lib/generation/prompts/loader.ts
const prompts = buildPrompt(PROMPT_IDS.REQUIREMENTS_TO_OUTLINES, {
requirement: requirements.requirement,
language: requirements.language,
pdfContent: pdfText?.substring(0, MAX_PDF_CONTENT_CHARS) || '无',
availableImages: formatAvailableImages(pdfImages, imageMapping),
researchContext: researchContext || '无',
mediaGenerationPolicy: buildMediaPolicy(headers),
teacherContext: formatTeacherPersonaForPrompt(agents),
});
最终发送给 LLM 的 Prompt:
# System Prompt (简化版)
You are a professional course content designer, skilled at transforming user requirements into structured scene outlines.
## Scene Types
- slide: Static PPT pages supporting text, images, charts, formulas
- quiz: Supports single-choice, multiple-choice, and short-answer questions
- interactive: Self-contained interactive HTML page
- pbl: Project-based learning module
## Output Format
You must output a JSON array where each element is a scene outline object:
```json
[
{
"id": "scene_1",
"type": "slide",
"title": "Scene Title",
"description": "Teaching purpose",
"keyPoints": ["Point 1", "Point 2"],
"order": 1,
"suggestedImageIds": ["img_1"],
"mediaGenerations": [...]
}
]
Must output valid JSON array format. Do not ask questions or request more information.
```markdown
# User Prompt (变量已填充)
Please generate scene outlines based on the following course requirements.
## User Requirements
帮我创建一个关于光合作用的初中生物课程,时长20分钟
## Course Language
**Required language**: zh-CN
## Reference Materials
### PDF Content Summary
光合作用是指绿色植物通过叶绿素,利用光能,把二氧化碳和水转化成...
### Available Images
[Image img_1] 光合作用示意图 (尺寸: 800×600, 宽高比: 1.33)
### Web Search Results
2024年最新研究表明,光合作用效率可以通过...
## Output Requirements
Please output JSON array directly without additional explanatory text.
1.3 LLM 调用参数
// app/api/generate/scene-outlines-stream/route.ts
// 判断是否使用 Vision 模式
const hasVision = !!modelInfo?.capabilities?.vision;
const visionImages = hasVision && imageMapping
? pdfImages.slice(0, MAX_VISION_IMAGES).map(img => ({
id: img.id,
src: imageMapping[img.id],
}))
: undefined;
// 流式调用
const streamParams = visionImages?.length
? {
// Vision 模式:图片直接传给模型
model: languageModel,
system: prompts.system,
messages: [{
role: 'user',
content: [
{ type: 'text', text: prompts.user },
...visionImages.map(img => ({
type: 'image',
image: img.src,
})),
],
}],
maxOutputTokens: modelInfo?.outputWindow, // 如 16384
}
: {
// 纯文本模式
model: languageModel,
system: prompts.system,
prompt: prompts.user,
maxOutputTokens: modelInfo?.outputWindow,
};
const result = streamLLM(streamParams, 'scene-outlines-stream');
Vision 模式下的 messages 结构:
messages: [{
role: 'user',
content: [
{ type: 'text', text: '...' }, // 用户 prompt
{ type: 'image', image: 'data:image/png;base64,...' }, // 图片 1
{ type: 'image', image: 'data:image/png;base64,...' }, // 图片 2
],
}]
1.4 响应处理(SSE 流式)
// 增量解析 JSON 数组
let parsedOutlines: SceneOutline[] = [];
for await (const chunk of result.textStream) {
fullText += chunk;
// 尝试从累积文本中提取完整的 JSON 对象
const newOutlines = extractNewOutlines(fullText, parsedOutlines.length);
for (const outline of newOutlines) {
const enriched = {
...outline,
id: outline.id || nanoid(),
order: parsedOutlines.length + 1,
};
parsedOutlines.push(enriched);
// 发送 SSE 事件
const event = JSON.stringify({
type: 'outline',
data: enriched,
index: parsedOutlines.length - 1,
});
controller.enqueue(encoder.encode(`data: ${event}\n\n`));
}
}
// 发送完成事件
controller.enqueue(encoder.encode(`data: ${JSON.stringify({
type: 'done',
outlines: parsedOutlines,
})}\n\n`));
1.5 输出示例
[
{
"id": "scene_1",
"type": "slide",
"title": "光合作用概述",
"description": "介绍光合作用的定义、场所和基本过程",
"keyPoints": [
"光合作用的定义",
"叶绿体是光合作用的场所",
"光合作用的总反应式"
],
"teachingObjective": "理解光合作用的基本概念",
"estimatedDuration": 180,
"order": 1,
"suggestedImageIds": ["img_1"],
"mediaGenerations": [
{
"type": "image",
"prompt": "A diagram showing chloroplast structure with thylakoid membranes",
"elementId": "gen_img_1",
"aspectRatio": "16:9"
}
]
},
{
"id": "scene_2",
"type": "quiz",
"title": "知识检测",
"description": "检测学生对光合作用基础知识的掌握",
"keyPoints": ["光合作用场所", "反应物和产物"],
"order": 2,
"quizConfig": {
"questionCount": 3,
"difficulty": "medium",
"questionTypes": ["single", "multiple"]
}
}
]
Stage 2: 大纲 → 场景内容
2.1 输入数据
// 前端发送的请求体
interface SceneContentRequest {
outline: SceneOutline; // 单个场景大纲
allOutlines: SceneOutline[]; // 所有大纲(用于上下文)
pdfImages?: PdfImage[]; // 图片列表
imageMapping?: ImageMapping; // 图片 URL 映射
stageInfo: {
name: string;
description?: string;
language?: string;
style?: string;
};
stageId: string;
agents?: AgentInfo[];
}
示例请求:
const request = {
outline: {
id: 'scene_1',
type: 'slide',
title: '光合作用概述',
description: '介绍光合作用的定义、场所和基本过程',
keyPoints: ['光合作用的定义', '叶绿体是光合作用的场所'],
order: 1,
suggestedImageIds: ['img_1'],
},
allOutlines: [...], // 所有场景大纲
pdfImages: [{ id: 'img_1', description: '光合作用示意图', ... }],
imageMapping: { 'img_1': 'data:image/png;base64,...' },
stageInfo: { name: '光合作用课程', language: 'zh-CN' },
stageId: 'stage_abc123',
};
2.2 类型分发
根据 outline.type 分发到不同的生成器:
// lib/generation/scene-generator.ts
export async function generateSceneContent(
outline: SceneOutline,
aiCall: AICallFn,
assignedImages?: PdfImage[],
imageMapping?: ImageMapping,
languageModel?: LanguageModel,
hasVision?: boolean,
generatedMediaMapping?: ImageMapping,
agents?: AgentInfo[],
): Promise<GenerationResult | null> {
switch (outline.type) {
case 'slide':
return generateSlideContent(outline, aiCall, assignedImages, ...);
case 'quiz':
return generateQuizContent(outline, aiCall, ...);
case 'interactive':
return generateInteractiveContent(outline, aiCall, ...);
case 'pbl':
return generatePBLContent(outline, languageModel, ...);
default:
return null;
}
}
2.3 Slide 内容生成
Prompt 构建:
// lib/generation/scene-generator.ts
async function generateSlideContent(
outline: SceneOutline,
aiCall: AICallFn,
assignedImages?: PdfImage[],
imageMapping?: ImageMapping,
visionImages?: Array<{ id: string; src: string }>,
generatedMediaMapping?: ImageMapping,
agents?: AgentInfo[],
): Promise<GeneratedSlideContent | null> {
// Canvas 尺寸(匹配视口)
const canvasWidth = 1000;
const canvasHeight = 562.5;
// 格式化分配的图片
const assignedImagesText = assignedImages?.map(img =>
`[Image ${img.id}] ${img.description} (尺寸: ${img.width}×${img.height}, 宽高比: ${(img.width / img.height).toFixed(2)})`
).join('\n') || '无可用图片';
const prompts = buildPrompt(PROMPT_IDS.SLIDE_CONTENT, {
title: outline.title,
description: outline.description,
keyPoints: (outline.keyPoints || []).map((p, i) => `${i + 1}. ${p}`).join('\n'),
elements: '(根据要点自动生成)',
assignedImages: assignedImagesText,
canvas_width: canvasWidth,
canvas_height: canvasHeight,
teacherContext: formatTeacherPersonaForPrompt(agents),
});
// 调用 LLM
const response = await aiCall(prompts.system, prompts.user, visionImages);
// 解析 JSON
const generatedData = parseJsonResponse<GeneratedSlideData>(response);
// ... 后处理
}
最终发送给 LLM 的 Prompt:
# System Prompt (简化版,实际约 900 行)
You are an educational content designer. Generate well-structured slide components with precise layouts.
## Canvas Specifications
- Dimensions: 1000 × 562.5
- Margins: Top/Bottom ≥ 50, Left/Right ≥ 50
## Element Types
### TextElement
```json
{
"id": "text_001",
"type": "text",
"left": 60,
"top": 80,
"width": 880,
"height": 76,
"content": "<p style=\"font-size: 24px;\">Title text</p>",
"defaultFontName": "",
"defaultColor": "#333333"
}
ImageElement
{
"id": "image_001",
"type": "image",
"left": 100,
"top": 150,
"width": 400,
"height": 300,
"src": "img_1",
"fixedRatio": true
}
Text Height Lookup Table
| Font Size | 1 line | 2 lines | 3 lines |
|---|---|---|---|
| 14px | 43 | 64 | 85 |
| 16px | 46 | 70 | 94 |
| ... | ... | ... | ... |
Output Format
Output valid JSON only. No explanations, no code blocks, no additional text.
```markdown
# User Prompt (变量已填充)
## Scene Information
- **Title**: 光合作用概述
- **Description**: 介绍光合作用的定义、场所和基本过程
- **Key Points**:
1. 光合作用的定义
2. 叶绿体是光合作用的场所
3. 光合作用的总反应式
## Available Resources
- **Available Images**: [Image img_1] 光合作用示意图 (尺寸: 800×600, 宽高比: 1.33)
- **Canvas Size**: 1000 × 562.5 px
## Output Requirements
Based on the scene information above, generate a complete Canvas/PPT component.
**Language Requirement**: All generated text content must be in the same language as the title.
**Must Follow**:
1. Output pure JSON directly
2. Do not wrap with ```json code blocks
3. Use the provided image_id (e.g., `img_1`) for the `src` field
4. All TextElement `height` values must be selected from the lookup table
2.4 LLM 调用参数
// 非流式调用(内容生成是同步的)
const result = await callLLM(
{
model: languageModel,
system: prompts.system,
prompt: prompts.user,
maxOutputTokens: modelInfo?.outputWindow, // 如 16384
},
'scene-content',
{
retries: 1, // 失败重试 1 次
validate: (text) => text.trim().length > 0,
}
);
2.5 输出示例
{
"background": {
"type": "solid",
"color": "#ffffff"
},
"elements": [
{
"id": "title_001",
"type": "text",
"left": 60,
"top": 50,
"width": 880,
"height": 76,
"content": "<p style=\"font-size: 32px;\"><strong>光合作用概述</strong></p>",
"defaultFontName": "",
"defaultColor": "#333333"
},
{
"id": "text_002",
"type": "text",
"left": 60,
"top": 150,
"width": 500,
"height": 130,
"content": "<p style=\"font-size: 18px;\">• 光合作用的定义</p><p style=\"font-size: 18px;\">• 叶绿体是光合作用的场所</p><p style=\"font-size: 18px;\">• 光合作用的总反应式</p>",
"defaultFontName": "",
"defaultColor": "#333333"
},
{
"id": "image_001",
"type": "image",
"left": 600,
"top": 120,
"width": 350,
"height": 263,
"src": "img_1",
"fixedRatio": true
}
],
"remark": "介绍光合作用的基本概念"
}
2.6 Quiz 内容生成(简述)
Quiz 类型的生成逻辑类似,但使用不同的模板:
const prompts = buildPrompt(PROMPT_IDS.QUIZ_CONTENT, {
title: outline.title,
description: outline.description,
keyPoints: keyPointsText,
questionCount: outline.quizConfig?.questionCount || 3,
difficulty: outline.quizConfig?.difficulty || 'medium',
questionTypes: outline.quizConfig?.questionTypes || ['single'],
});
输出示例:
{
"questions": [
{
"id": "q1",
"type": "single",
"question": "光合作用发生在植物的哪个细胞器中?",
"options": ["线粒体", "叶绿体", "核糖体", "高尔基体"],
"correctAnswer": 1,
"explanation": "叶绿体是光合作用的场所..."
}
]
}
Stage 3: 内容 → 教学动作
3.1 输入数据
// 前端发送的请求体
interface SceneActionsRequest {
outline: SceneOutline;
allOutlines: SceneOutline[];
content: GeneratedSlideContent | GeneratedQuizContent | ...;
stageId: string;
agents?: AgentInfo[];
previousSpeeches?: string[]; // 前序场景的讲解内容(用于连贯性)
userProfile?: string; // 学生信息
}
3.2 Prompt 构建
// lib/generation/scene-generator.ts
async function generateSlideActions(
outline: SceneOutline,
content: GeneratedSlideContent,
aiCall: AICallFn,
ctx?: SceneGenerationContext,
agents?: AgentInfo[],
userProfile?: string,
): Promise<Action[]> {
// 格式化元素列表供 AI 选择
const elementsText = content.elements.map(el =>
`- [${el.id}] ${el.type}${el.type === 'text' ? `: "${el.content?.substring(0, 50)}..."` : ''}`
).join('\n');
// 构建课程上下文
const courseContext = ctx
? `当前第 ${ctx.pageIndex} 页 / 共 ${ctx.totalPages} 页`
: '';
const prompts = buildPrompt(PROMPT_IDS.SLIDE_ACTIONS, {
title: outline.title,
keyPoints: (outline.keyPoints || []).map((p, i) => `${i + 1}. ${p}`).join('\n'),
description: outline.description,
elements: elementsText,
courseContext: courseContext,
agents: formatAgentsForPrompt(agents),
userProfile: userProfile || '',
});
const response = await aiCall(prompts.system, prompts.user);
return parseActionsFromStructuredOutput(response, outline.type);
}
最终发送给 LLM 的 Prompt:
# System Prompt (简化版)
You are a professional instructional designer responsible for generating teaching action sequences.
## Action Types
### spotlight (Focus Element)
```json
{ "type": "action", "name": "spotlight", "params": { "elementId": "text_abc123" } }
text (Speech)
{ "type": "text", "content": "First, let's look at the key concept..." }
discussion (Interactive Discussion)
{
"type": "action",
"name": "discussion",
"params": {
"topic": "Discussion topic",
"prompt": "Guiding prompt"
}
}
Design Requirements
- Speech Content: Generate natural teaching speech
- Same-session continuity: All pages belong to the same class session
- Opening/Transition:
- First page: Open with greeting
- Middle pages: Continue naturally, no greeting
- Last page: Summarize and close
Output Format
Output a JSON array directly. No explanation, no code fences.
```markdown
# User Prompt (变量已填充)
Elements:
- [title_001] text: "光合作用概述"
- [text_002] text: "• 光合作用的定义..."
- [image_001] image
Title: 光合作用概述
Key Points:
1. 光合作用的定义
2. 叶绿体是光合作用的场所
3. 光合作用的总反应式
Description: 介绍光合作用的定义、场所和基本过程
当前第 1 页 / 共 5 页
**Language Requirement**: Generated speech content must be in Chinese.
Output as a JSON array directly (5-10 segments):
3.3 LLM 调用参数
const result = await callLLM(
{
model: languageModel,
system: prompts.system,
prompt: prompts.user,
maxOutputTokens: 4096, // 动作序列不需要太多 token
},
'scene-actions'
);
3.4 输出示例
[
{
"type": "text",
"content": "同学们好!今天我们来学习光合作用,这是绿色植物最重要的生理过程之一。"
},
{
"type": "action",
"name": "spotlight",
"params": { "elementId": "title_001" }
},
{
"type": "text",
"content": "光合作用,简单来说,就是绿色植物利用光能,把二氧化碳和水转化成有机物的过程。"
},
{
"type": "action",
"name": "spotlight",
"params": { "elementId": "text_002" }
},
{
"type": "text",
"content": "让我们来看看光合作用的三个关键点:定义、场所和反应式。"
},
{
"type": "action",
"name": "spotlight",
"params": { "elementId": "image_001" }
},
{
"type": "text",
"content": "这张图展示了光合作用的基本过程,大家可以看到叶绿体在其中发挥的关键作用。"
}
]
3.5 后处理
解析后的动作会进行后处理:
function processActions(
actions: Action[],
elements: PPTElement[],
agents?: AgentInfo[]
): Action[] {
return actions.map((action, index) => ({
...action,
id: `action_${nanoid(8)}`,
timing: index * 3000, // 默认每 3 秒一个动作
// 验证 elementId 是否有效
...(action.type === 'spotlight' && {
params: {
elementId: elements.find(el => el.id === action.params?.elementId)
? action.params.elementId
: elements[0]?.id, // 回退到第一个元素
},
}),
}));
}
完整流程示例
用户输入
帮我创建一个关于光合作用的初中生物课程,时长20分钟
Stage 1 输出
[
{ "id": "scene_1", "type": "slide", "title": "光合作用概述", ... },
{ "id": "scene_2", "type": "slide", "title": "光反应阶段", ... },
{ "id": "scene_3", "type": "interactive", "title": "光合作用模拟", ... },
{ "id": "scene_4", "type": "quiz", "title": "知识检测", ... },
{ "id": "scene_5", "type": "slide", "title": "总结与拓展", ... }
]
Stage 2 输出 (scene_1)
{
"elements": [
{ "id": "title_001", "type": "text", "content": "<p>光合作用概述</p>", ... },
{ "id": "text_002", "type": "text", ... },
{ "id": "image_001", "type": "image", "src": "img_1", ... }
],
"background": { "type": "solid", "color": "#ffffff" }
}
Stage 3 输出 (scene_1)
[
{ "type": "text", "content": "同学们好!今天我们来学习光合作用..." },
{ "type": "action", "name": "spotlight", "params": { "elementId": "title_001" } },
{ "type": "text", "content": "光合作用是绿色植物最重要的生理过程..." },
...
]
最终组装
// lib/generation/generation-pipeline.ts
function buildCompleteScene(
outline: SceneOutline,
content: GeneratedContent,
actions: Action[],
stageId: string
): Scene {
return {
id: nanoid(),
stageId,
title: outline.title,
type: outline.type,
order: outline.order,
// 根据类型填充内容
...(outline.type === 'slide' && { elements: content.elements }),
...(outline.type === 'quiz' && { questions: content.questions }),
...(outline.type === 'interactive' && { html: content.html }),
// 动作序列
actions,
// 元数据
duration: outline.estimatedDuration,
};
}
关键设计决策
为什么分三阶段?
- 降低单次 LLM 调用复杂度:每个阶段专注一件事
- 支持并行生成:Stage 2 的每个场景可以并行调用
- 更好的错误恢复:某个阶段失败可以单独重试
- 上下文传递:前序阶段的输出可以作为后续阶段的上下文
为什么 Stage 1 用流式?
- 用户体验:用户可以立即看到第一个大纲,不用等待全部完成
- 错误隔离:即使后几个大纲生成失败,前面的仍可用
- 进度反馈:前端可以显示实时进度
为什么 Stage 2/3 不是流式?
- 单个场景内容较小:不需要流式传输
- 需要完整解析:元素 ID 需要在动作生成时引用
- 后处理依赖完整数据:如 LaTeX 渲染、图片 URL 替换
附录:请求头约定
所有生成 API 通过请求头传递配置:
x-provider-id: openai # LLM 提供商
x-model-id: gpt-4o # 模型 ID
x-image-generation-enabled: true # 是否允许生成图片
x-video-generation-enabled: false # 是否允许生成视频
服务端解析:
// lib/server/resolve-model.ts
export function resolveModelFromHeaders(req: NextRequest) {
const providerId = req.headers.get('x-provider-id') as ProviderId;
const modelId = req.headers.get('x-model-id');
const model = PROVIDERS[providerId]?.models.find(m => m.id === modelId);
const languageModel = createModel(providerId, modelId);
return {
model: languageModel,
modelInfo: model,
modelString: `${providerId}/${modelId}`,
};
}