OpenMAIC 课程生成的三阶段流水线架构上一篇文章介绍了 OpenMAIC 总体的架构分析，这节我们讲解他是怎么产出

上一篇文章介绍了 OpenMAIC 总体的架构分析，这节我们讲解他是怎么产出课程内容，产出课程内容的 Agent 怎么设计的，做了哪些功能。正在学习 Agent 开发的同学可以额外关注一下。

本文档详细讲解 OpenMAIC 课程生成的三阶段流水线架构，包括每个阶段的输入输出、Prompt 构建过程、以及 LLM 调用的具体参数。

概览

用户输入
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 1: 需求 → 场景大纲                                    │
│  API: POST /api/generate/scene-outlines-stream              │
│  特点: SSE 流式返回，增量解析 JSON                            │
└─────────────────────────────────────────────────────────────┘
    │
    ▼ (每个大纲独立调用)
┌─────────────────────────────────────────────────────────────┐
│  Stage 2: 大纲 → 场景内容                                    │
│  API: POST /api/generate/scene-content                      │
│  特点: 根据 type 分发到不同生成器                             │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────────────────────────┐
│  Stage 3: 内容 → 教学动作                                    │
│  API: POST /api/generate/scene-actions                      │
│  特点: 结合上下文生成讲解序列                                  │
└─────────────────────────────────────────────────────────────┘
    │
    ▼
完整课堂 (Stage)

Stage 1: 需求 → 场景大纲

1.1 输入数据

// 前端发送的请求体
interface OutlineRequest {
  requirements: {
    requirement: string;        // 用户自然语言需求
    language: 'zh-CN' | 'en-US';
    userNickname?: string;      // 学生昵称
    userBio?: string;           // 学生背景
    webSearch?: boolean;        // 是否启用网络搜索
  };
  pdfText?: string;             // PDF 提取的文本（已截断）
  pdfImages?: Array<{           // PDF 提取的图片列表
    id: string;
    description: string;
    width?: number;
    height?: number;
  }>;
  imageMapping?: Record<string, string>;  // 图片ID → base64 URL
  researchContext?: string;     // 网络搜索结果
  agents?: Array<{              // 教师 Agent 信息
    id: string;
    name: string;
    persona: string;
    role: 'teacher' | 'student';
  }>;
}

示例请求：

const request = {
  requirements: {
    requirement: '帮我创建一个关于光合作用的初中生物课程，时长20分钟',
    language: 'zh-CN',
    webSearch: true,
  },
  pdfText: '光合作用是指绿色植物通过叶绿素...',
  pdfImages: [
    { id: 'img_1', description: '光合作用示意图', width: 800, height: 600 },
  ],
  imageMapping: {
    'img_1': 'data:image/png;base64,iVBORw0KGgo...',
  },
  researchContext: '2024年最新研究表明...',
  agents: [
    { id: 'teacher_1', name: '李老师', persona: '亲切的初中生物老师', role: 'teacher' },
  ],
};

1.2 Prompt 构建

模板文件：

lib/generation/prompts/templates/requirements-to-outlines/system.md
lib/generation/prompts/templates/requirements-to-outlines/user.md

变量插值过程：

// lib/generation/prompts/loader.ts
const prompts = buildPrompt(PROMPT_IDS.REQUIREMENTS_TO_OUTLINES, {
  requirement: requirements.requirement,
  language: requirements.language,
  pdfContent: pdfText?.substring(0, MAX_PDF_CONTENT_CHARS) || '无',
  availableImages: formatAvailableImages(pdfImages, imageMapping),
  researchContext: researchContext || '无',
  mediaGenerationPolicy: buildMediaPolicy(headers),
  teacherContext: formatTeacherPersonaForPrompt(agents),
});

最终发送给 LLM 的 Prompt：

# System Prompt (简化版)

You are a professional course content designer, skilled at transforming user requirements into structured scene outlines.

## Scene Types
- slide: Static PPT pages supporting text, images, charts, formulas
- quiz: Supports single-choice, multiple-choice, and short-answer questions
- interactive: Self-contained interactive HTML page
- pbl: Project-based learning module

## Output Format
You must output a JSON array where each element is a scene outline object:

```json
[
  {
    "id": "scene_1",
    "type": "slide",
    "title": "Scene Title",
    "description": "Teaching purpose",
    "keyPoints": ["Point 1", "Point 2"],
    "order": 1,
    "suggestedImageIds": ["img_1"],
    "mediaGenerations": [...]
  }
]

Must output valid JSON array format. Do not ask questions or request more information.


```markdown
# User Prompt (变量已填充)

Please generate scene outlines based on the following course requirements.

## User Requirements
帮我创建一个关于光合作用的初中生物课程，时长20分钟

## Course Language
**Required language**: zh-CN

## Reference Materials

### PDF Content Summary
光合作用是指绿色植物通过叶绿素，利用光能，把二氧化碳和水转化成...

### Available Images
[Image img_1] 光合作用示意图 (尺寸: 800×600, 宽高比: 1.33)

### Web Search Results
2024年最新研究表明，光合作用效率可以通过...

## Output Requirements
Please output JSON array directly without additional explanatory text.

1.3 LLM 调用参数

// app/api/generate/scene-outlines-stream/route.ts

// 判断是否使用 Vision 模式
const hasVision = !!modelInfo?.capabilities?.vision;
const visionImages = hasVision && imageMapping
  ? pdfImages.slice(0, MAX_VISION_IMAGES).map(img => ({
      id: img.id,
      src: imageMapping[img.id],
    }))
  : undefined;

// 流式调用
const streamParams = visionImages?.length
  ? {
      // Vision 模式：图片直接传给模型
      model: languageModel,
      system: prompts.system,
      messages: [{
        role: 'user',
        content: [
          { type: 'text', text: prompts.user },
          ...visionImages.map(img => ({
            type: 'image',
            image: img.src,
          })),
        ],
      }],
      maxOutputTokens: modelInfo?.outputWindow,  // 如 16384
    }
  : {
      // 纯文本模式
      model: languageModel,
      system: prompts.system,
      prompt: prompts.user,
      maxOutputTokens: modelInfo?.outputWindow,
    };

const result = streamLLM(streamParams, 'scene-outlines-stream');

Vision 模式下的 messages 结构：

messages: [{
  role: 'user',
  content: [
    { type: 'text', text: '...' },  // 用户 prompt
    { type: 'image', image: 'data:image/png;base64,...' },  // 图片 1
    { type: 'image', image: 'data:image/png;base64,...' },  // 图片 2
  ],
}]

1.4 响应处理（SSE 流式）

// 增量解析 JSON 数组
let parsedOutlines: SceneOutline[] = [];

for await (const chunk of result.textStream) {
  fullText += chunk;

  // 尝试从累积文本中提取完整的 JSON 对象
  const newOutlines = extractNewOutlines(fullText, parsedOutlines.length);

  for (const outline of newOutlines) {
    const enriched = {
      ...outline,
      id: outline.id || nanoid(),
      order: parsedOutlines.length + 1,
    };
    parsedOutlines.push(enriched);

    // 发送 SSE 事件
    const event = JSON.stringify({
      type: 'outline',
      data: enriched,
      index: parsedOutlines.length - 1,
    });
    controller.enqueue(encoder.encode(`data: ${event}\n\n`));
  }
}

// 发送完成事件
controller.enqueue(encoder.encode(`data: ${JSON.stringify({
  type: 'done',
  outlines: parsedOutlines,
})}\n\n`));

1.5 输出示例

[
  {
    "id": "scene_1",
    "type": "slide",
    "title": "光合作用概述",
    "description": "介绍光合作用的定义、场所和基本过程",
    "keyPoints": [
      "光合作用的定义",
      "叶绿体是光合作用的场所",
      "光合作用的总反应式"
    ],
    "teachingObjective": "理解光合作用的基本概念",
    "estimatedDuration": 180,
    "order": 1,
    "suggestedImageIds": ["img_1"],
    "mediaGenerations": [
      {
        "type": "image",
        "prompt": "A diagram showing chloroplast structure with thylakoid membranes",
        "elementId": "gen_img_1",
        "aspectRatio": "16:9"
      }
    ]
  },
  {
    "id": "scene_2",
    "type": "quiz",
    "title": "知识检测",
    "description": "检测学生对光合作用基础知识的掌握",
    "keyPoints": ["光合作用场所", "反应物和产物"],
    "order": 2,
    "quizConfig": {
      "questionCount": 3,
      "difficulty": "medium",
      "questionTypes": ["single", "multiple"]
    }
  }
]

Stage 2: 大纲 → 场景内容

2.1 输入数据

// 前端发送的请求体
interface SceneContentRequest {
  outline: SceneOutline;        // 单个场景大纲
  allOutlines: SceneOutline[];  // 所有大纲（用于上下文）
  pdfImages?: PdfImage[];       // 图片列表
  imageMapping?: ImageMapping;  // 图片 URL 映射
  stageInfo: {
    name: string;
    description?: string;
    language?: string;
    style?: string;
  };
  stageId: string;
  agents?: AgentInfo[];
}

示例请求：

const request = {
  outline: {
    id: 'scene_1',
    type: 'slide',
    title: '光合作用概述',
    description: '介绍光合作用的定义、场所和基本过程',
    keyPoints: ['光合作用的定义', '叶绿体是光合作用的场所'],
    order: 1,
    suggestedImageIds: ['img_1'],
  },
  allOutlines: [...],  // 所有场景大纲
  pdfImages: [{ id: 'img_1', description: '光合作用示意图', ... }],
  imageMapping: { 'img_1': 'data:image/png;base64,...' },
  stageInfo: { name: '光合作用课程', language: 'zh-CN' },
  stageId: 'stage_abc123',
};

2.2 类型分发

根据 outline.type 分发到不同的生成器：

// lib/generation/scene-generator.ts

export async function generateSceneContent(
  outline: SceneOutline,
  aiCall: AICallFn,
  assignedImages?: PdfImage[],
  imageMapping?: ImageMapping,
  languageModel?: LanguageModel,
  hasVision?: boolean,
  generatedMediaMapping?: ImageMapping,
  agents?: AgentInfo[],
): Promise<GenerationResult | null> {

  switch (outline.type) {
    case 'slide':
      return generateSlideContent(outline, aiCall, assignedImages, ...);
    case 'quiz':
      return generateQuizContent(outline, aiCall, ...);
    case 'interactive':
      return generateInteractiveContent(outline, aiCall, ...);
    case 'pbl':
      return generatePBLContent(outline, languageModel, ...);
    default:
      return null;
  }
}

2.3 Slide 内容生成

Prompt 构建：

// lib/generation/scene-generator.ts

async function generateSlideContent(
  outline: SceneOutline,
  aiCall: AICallFn,
  assignedImages?: PdfImage[],
  imageMapping?: ImageMapping,
  visionImages?: Array<{ id: string; src: string }>,
  generatedMediaMapping?: ImageMapping,
  agents?: AgentInfo[],
): Promise<GeneratedSlideContent | null> {

  // Canvas 尺寸（匹配视口）
  const canvasWidth = 1000;
  const canvasHeight = 562.5;

  // 格式化分配的图片
  const assignedImagesText = assignedImages?.map(img =>
    `[Image ${img.id}] ${img.description} (尺寸: ${img.width}×${img.height}, 宽高比: ${(img.width / img.height).toFixed(2)})`
  ).join('\n') || '无可用图片';

  const prompts = buildPrompt(PROMPT_IDS.SLIDE_CONTENT, {
    title: outline.title,
    description: outline.description,
    keyPoints: (outline.keyPoints || []).map((p, i) => `${i + 1}. ${p}`).join('\n'),
    elements: '（根据要点自动生成）',
    assignedImages: assignedImagesText,
    canvas_width: canvasWidth,
    canvas_height: canvasHeight,
    teacherContext: formatTeacherPersonaForPrompt(agents),
  });

  // 调用 LLM
  const response = await aiCall(prompts.system, prompts.user, visionImages);

  // 解析 JSON
  const generatedData = parseJsonResponse<GeneratedSlideData>(response);
  // ... 后处理
}

最终发送给 LLM 的 Prompt：

# System Prompt (简化版，实际约 900 行)

You are an educational content designer. Generate well-structured slide components with precise layouts.

## Canvas Specifications
- Dimensions: 1000 × 562.5
- Margins: Top/Bottom ≥ 50, Left/Right ≥ 50

## Element Types

### TextElement
```json
{
  "id": "text_001",
  "type": "text",
  "left": 60,
  "top": 80,
  "width": 880,
  "height": 76,
  "content": "<p style=\"font-size: 24px;\">Title text</p>",
  "defaultFontName": "",
  "defaultColor": "#333333"
}

ImageElement

{
  "id": "image_001",
  "type": "image",
  "left": 100,
  "top": 150,
  "width": 400,
  "height": 300,
  "src": "img_1",
  "fixedRatio": true
}

Text Height Lookup Table

Font Size	1 line	2 lines	3 lines
14px	43	64	85
16px	46	70	94
...	...	...	...

Output Format

Output valid JSON only. No explanations, no code blocks, no additional text.


```markdown
# User Prompt (变量已填充)

## Scene Information
- **Title**: 光合作用概述
- **Description**: 介绍光合作用的定义、场所和基本过程
- **Key Points**:
  1. 光合作用的定义
  2. 叶绿体是光合作用的场所
  3. 光合作用的总反应式

## Available Resources
- **Available Images**: [Image img_1] 光合作用示意图 (尺寸: 800×600, 宽高比: 1.33)
- **Canvas Size**: 1000 × 562.5 px

## Output Requirements
Based on the scene information above, generate a complete Canvas/PPT component.

**Language Requirement**: All generated text content must be in the same language as the title.

**Must Follow**:
1. Output pure JSON directly
2. Do not wrap with ```json code blocks
3. Use the provided image_id (e.g., `img_1`) for the `src` field
4. All TextElement `height` values must be selected from the lookup table

2.4 LLM 调用参数

// 非流式调用（内容生成是同步的）
const result = await callLLM(
  {
    model: languageModel,
    system: prompts.system,
    prompt: prompts.user,
    maxOutputTokens: modelInfo?.outputWindow,  // 如 16384
  },
  'scene-content',
  {
    retries: 1,  // 失败重试 1 次
    validate: (text) => text.trim().length > 0,
  }
);

2.5 输出示例

{
  "background": {
    "type": "solid",
    "color": "#ffffff"
  },
  "elements": [
    {
      "id": "title_001",
      "type": "text",
      "left": 60,
      "top": 50,
      "width": 880,
      "height": 76,
      "content": "<p style=\"font-size: 32px;\"><strong>光合作用概述</strong></p>",
      "defaultFontName": "",
      "defaultColor": "#333333"
    },
    {
      "id": "text_002",
      "type": "text",
      "left": 60,
      "top": 150,
      "width": 500,
      "height": 130,
      "content": "<p style=\"font-size: 18px;\">• 光合作用的定义</p><p style=\"font-size: 18px;\">• 叶绿体是光合作用的场所</p><p style=\"font-size: 18px;\">• 光合作用的总反应式</p>",
      "defaultFontName": "",
      "defaultColor": "#333333"
    },
    {
      "id": "image_001",
      "type": "image",
      "left": 600,
      "top": 120,
      "width": 350,
      "height": 263,
      "src": "img_1",
      "fixedRatio": true
    }
  ],
  "remark": "介绍光合作用的基本概念"
}

2.6 Quiz 内容生成（简述）

Quiz 类型的生成逻辑类似，但使用不同的模板：

const prompts = buildPrompt(PROMPT_IDS.QUIZ_CONTENT, {
  title: outline.title,
  description: outline.description,
  keyPoints: keyPointsText,
  questionCount: outline.quizConfig?.questionCount || 3,
  difficulty: outline.quizConfig?.difficulty || 'medium',
  questionTypes: outline.quizConfig?.questionTypes || ['single'],
});

输出示例：

{
  "questions": [
    {
      "id": "q1",
      "type": "single",
      "question": "光合作用发生在植物的哪个细胞器中？",
      "options": ["线粒体", "叶绿体", "核糖体", "高尔基体"],
      "correctAnswer": 1,
      "explanation": "叶绿体是光合作用的场所..."
    }
  ]
}

Stage 3: 内容 → 教学动作

3.1 输入数据

// 前端发送的请求体
interface SceneActionsRequest {
  outline: SceneOutline;
  allOutlines: SceneOutline[];
  content: GeneratedSlideContent | GeneratedQuizContent | ...;
  stageId: string;
  agents?: AgentInfo[];
  previousSpeeches?: string[];  // 前序场景的讲解内容（用于连贯性）
  userProfile?: string;         // 学生信息
}

3.2 Prompt 构建

// lib/generation/scene-generator.ts

async function generateSlideActions(
  outline: SceneOutline,
  content: GeneratedSlideContent,
  aiCall: AICallFn,
  ctx?: SceneGenerationContext,
  agents?: AgentInfo[],
  userProfile?: string,
): Promise<Action[]> {

  // 格式化元素列表供 AI 选择
  const elementsText = content.elements.map(el =>
    `- [${el.id}] ${el.type}${el.type === 'text' ? `: "${el.content?.substring(0, 50)}..."` : ''}`
  ).join('\n');

  // 构建课程上下文
  const courseContext = ctx
    ? `当前第 ${ctx.pageIndex} 页 / 共 ${ctx.totalPages} 页`
    : '';

  const prompts = buildPrompt(PROMPT_IDS.SLIDE_ACTIONS, {
    title: outline.title,
    keyPoints: (outline.keyPoints || []).map((p, i) => `${i + 1}. ${p}`).join('\n'),
    description: outline.description,
    elements: elementsText,
    courseContext: courseContext,
    agents: formatAgentsForPrompt(agents),
    userProfile: userProfile || '',
  });

  const response = await aiCall(prompts.system, prompts.user);
  return parseActionsFromStructuredOutput(response, outline.type);
}

最终发送给 LLM 的 Prompt：

# System Prompt (简化版)

You are a professional instructional designer responsible for generating teaching action sequences.

## Action Types

### spotlight (Focus Element)
```json
{ "type": "action", "name": "spotlight", "params": { "elementId": "text_abc123" } }

text (Speech)

{ "type": "text", "content": "First, let's look at the key concept..." }

discussion (Interactive Discussion)

{
  "type": "action",
  "name": "discussion",
  "params": {
    "topic": "Discussion topic",
    "prompt": "Guiding prompt"
  }
}

Design Requirements

Speech Content: Generate natural teaching speech
Same-session continuity: All pages belong to the same class session
Opening/Transition:
- First page: Open with greeting
- Middle pages: Continue naturally, no greeting
- Last page: Summarize and close

Output Format

Output a JSON array directly. No explanation, no code fences.


```markdown
# User Prompt (变量已填充)

Elements:
- [title_001] text: "光合作用概述"
- [text_002] text: "• 光合作用的定义..."
- [image_001] image

Title: 光合作用概述
Key Points:
1. 光合作用的定义
2. 叶绿体是光合作用的场所
3. 光合作用的总反应式
Description: 介绍光合作用的定义、场所和基本过程

当前第 1 页 / 共 5 页

**Language Requirement**: Generated speech content must be in Chinese.

Output as a JSON array directly (5-10 segments):

3.3 LLM 调用参数

const result = await callLLM(
  {
    model: languageModel,
    system: prompts.system,
    prompt: prompts.user,
    maxOutputTokens: 4096,  // 动作序列不需要太多 token
  },
  'scene-actions'
);

3.4 输出示例

[
  {
    "type": "text",
    "content": "同学们好！今天我们来学习光合作用，这是绿色植物最重要的生理过程之一。"
  },
  {
    "type": "action",
    "name": "spotlight",
    "params": { "elementId": "title_001" }
  },
  {
    "type": "text",
    "content": "光合作用，简单来说，就是绿色植物利用光能，把二氧化碳和水转化成有机物的过程。"
  },
  {
    "type": "action",
    "name": "spotlight",
    "params": { "elementId": "text_002" }
  },
  {
    "type": "text",
    "content": "让我们来看看光合作用的三个关键点：定义、场所和反应式。"
  },
  {
    "type": "action",
    "name": "spotlight",
    "params": { "elementId": "image_001" }
  },
  {
    "type": "text",
    "content": "这张图展示了光合作用的基本过程，大家可以看到叶绿体在其中发挥的关键作用。"
  }
]

3.5 后处理

解析后的动作会进行后处理：

function processActions(
  actions: Action[],
  elements: PPTElement[],
  agents?: AgentInfo[]
): Action[] {
  return actions.map((action, index) => ({
    ...action,
    id: `action_${nanoid(8)}`,
    timing: index * 3000,  // 默认每 3 秒一个动作
    // 验证 elementId 是否有效
    ...(action.type === 'spotlight' && {
      params: {
        elementId: elements.find(el => el.id === action.params?.elementId)
          ? action.params.elementId
          : elements[0]?.id,  // 回退到第一个元素
      },
    }),
  }));
}

完整流程示例

用户输入

帮我创建一个关于光合作用的初中生物课程，时长20分钟

Stage 1 输出

[
  { "id": "scene_1", "type": "slide", "title": "光合作用概述", ... },
  { "id": "scene_2", "type": "slide", "title": "光反应阶段", ... },
  { "id": "scene_3", "type": "interactive", "title": "光合作用模拟", ... },
  { "id": "scene_4", "type": "quiz", "title": "知识检测", ... },
  { "id": "scene_5", "type": "slide", "title": "总结与拓展", ... }
]

Stage 2 输出 (scene_1)

{
  "elements": [
    { "id": "title_001", "type": "text", "content": "<p>光合作用概述</p>", ... },
    { "id": "text_002", "type": "text", ... },
    { "id": "image_001", "type": "image", "src": "img_1", ... }
  ],
  "background": { "type": "solid", "color": "#ffffff" }
}

Stage 3 输出 (scene_1)

[
  { "type": "text", "content": "同学们好！今天我们来学习光合作用..." },
  { "type": "action", "name": "spotlight", "params": { "elementId": "title_001" } },
  { "type": "text", "content": "光合作用是绿色植物最重要的生理过程..." },
  ...
]

最终组装

// lib/generation/generation-pipeline.ts

function buildCompleteScene(
  outline: SceneOutline,
  content: GeneratedContent,
  actions: Action[],
  stageId: string
): Scene {
  return {
    id: nanoid(),
    stageId,
    title: outline.title,
    type: outline.type,
    order: outline.order,

    // 根据类型填充内容
    ...(outline.type === 'slide' && { elements: content.elements }),
    ...(outline.type === 'quiz' && { questions: content.questions }),
    ...(outline.type === 'interactive' && { html: content.html }),

    // 动作序列
    actions,

    // 元数据
    duration: outline.estimatedDuration,
  };
}

关键设计决策

为什么分三阶段？

降低单次 LLM 调用复杂度：每个阶段专注一件事
支持并行生成：Stage 2 的每个场景可以并行调用
更好的错误恢复：某个阶段失败可以单独重试
上下文传递：前序阶段的输出可以作为后续阶段的上下文

为什么 Stage 1 用流式？

用户体验：用户可以立即看到第一个大纲，不用等待全部完成
错误隔离：即使后几个大纲生成失败，前面的仍可用
进度反馈：前端可以显示实时进度

为什么 Stage 2/3 不是流式？

单个场景内容较小：不需要流式传输
需要完整解析：元素 ID 需要在动作生成时引用
后处理依赖完整数据：如 LaTeX 渲染、图片 URL 替换

附录：请求头约定

所有生成 API 通过请求头传递配置：

x-provider-id: openai           # LLM 提供商
x-model-id: gpt-4o              # 模型 ID
x-image-generation-enabled: true  # 是否允许生成图片
x-video-generation-enabled: false # 是否允许生成视频

服务端解析：

// lib/server/resolve-model.ts

export function resolveModelFromHeaders(req: NextRequest) {
  const providerId = req.headers.get('x-provider-id') as ProviderId;
  const modelId = req.headers.get('x-model-id');

  const model = PROVIDERS[providerId]?.models.find(m => m.id === modelId);
  const languageModel = createModel(providerId, modelId);

  return {
    model: languageModel,
    modelInfo: model,
    modelString: `${providerId}/${modelId}`,
  };
}