一文读懂 AI Agent：从 LLM 到智能代理的完整指南想要理解 AI Agent？本文从最基础的概念讲起，带你一步

一文读懂 AI Agent：从 LLM 到智能代理的完整指南

想要理解 AI Agent？本文从最基础的概念讲起，带你一步步搞懂 Agent 的核心原理、技术架构、核心组件以及主流框架对比。读完就能动手实现自己的 AI Agent！

一、什么是 AI Agent？

1.1 从 LLM 到 Agent 的演进

大型语言模型（LLM）本身只是被动响应的工具——用户输入提示，模型输出回答。而 AI Agent（人工智能代理）则赋予了模型主动思考、规划和使用工具的能力，使其能够：

自主规划：将复杂任务分解为可执行的步骤
工具使用：调用外部 API、数据库、文件系统等完成实际工作
持续迭代：根据执行结果动态调整下一步行动
长期记忆：在多轮对话中保持上下文连贯性

简单来说：LLM 是大脑，Agent 是大脑 + 手脚 + 感官。

1.2 Agent 的核心定义

AI Agent = LLM（推理引擎） + Planning（规划能力） + Tools（工具集） + Memory（记忆机制）

这四个要素构成了现代 Agent 系统的基础架构。

二、AI Agent 的技术架构

2.1 整体架构图

┌─────────────────────────────────────────────────────────────────┐
│                        User Interface                            │
│                  (Chat / API / Webhook)                         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Agent Controller                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐ │
│  │   Planner   │  │   Executor  │  │      Memory Manager     │ │
│  │  (思考/规划)  │  │  (工具调用)  │  │   (短期/长期记忆)        │ │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    LLM (Language Model)                          │
│              (推理、决策、内容生成)                               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Tool Ecosystem                              │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐   │
│  │Web Search│ │  API     │ │ Database │ │  File System     │   │
│  │  搜索   │ │  调用    │ │  查询    │ │    读写          │   │
│  └──────────┘ └──────────┘ └──────────┘ └──────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

2.2 核心组件详解

2.2.1 LLM 推理引擎

LLM 是 Agent 的"大脑"，负责：

理解用户意图：解析自然语言输入
推理决策：判断需要采取什么行动
生成内容：输出文本响应或工具调用参数

主流模型选择：

OpenAI: GPT-5 系列（推荐通过 AI Gateway 使用）
Anthropic: Claude 4 系列（Sonnet 4.6 性价比最高）
Google: Gemini 2.0/3.0 系列
国内: 通义千问、文心一言、DeepSeek 等

2.2.2 Planning（规划模块）

规划模块决定 Agent 如何完成复杂任务，主要模式包括：

1. ReAct（Reasoning + Acting）

思考 → 行动 → 观察结果 → 继续思考 → ...

2. Chain of Thought（思维链）

问题 → 逐步推理 → 最终答案

3. Tree of Thought（思维树）

         问题
       /  |  \
    分支1 分支2 分支3
      ↓     ↓     ↓
   评估   评估   评估
      \     |     /
         最优路径

2.2.3 Tools（工具系统）

工具是 Agent 与外部世界交互的桥梁：

工具类型	用途	示例
搜索工具	获取实时信息	Web Search、Wikipedia
API 工具	调用外部服务	REST API、GraphQL
数据库工具	数据持久化	SQL 查询、向量检索
文件工具	读写文件	读取文档、生成报告
计算工具	精确计算	数学运算、代码执行

2.2.4 Memory（记忆机制）

短期记忆（Working Memory）：

当前对话的上下文
最近 N 轮对话历史
工具执行结果

长期记忆（Long-term Memory）：

用户偏好设置
历史交互模式
知识库检索

三、AI Agent 的实现流程

3.1 单轮交互流程

用户输入 → 意图识别 → LLM 推理 → 响应生成 → 返回结果

3.2 多轮 Agent 循环（核心流程）

这是 Agent 系统的精髓所在：

┌──────────────────────────────────────────────────────────────┐
│                      START                                    │
└──────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────┐
│  1. Receive User Input (接收用户输入)                         │
│     - 原始文本 / API 请求                                      │
└──────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────┐
│  2. Retrieve Context (获取上下文)                             │
│     - 从 Memory 加载历史对话                                   │
│     - 检索相关知识库                                           │
└──────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────────┐
│  3. LLM Reasoning (LLM 推理)                                  │
│     - 分析用户意图                                             │
│     - 决定是否需要调用工具                                      │
│     - 生成工具调用参数                                          │
└──────────────────────────────────────────────────────────────┘
                              │
                              ▼
              ┌───────────────┴───────────────┐
              │        Decision Point         │
              │        (决策分支点)            │
              └───────────────┬───────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ┌─────────┐    ┌───────────┐   ┌──────────┐
        │需要工具 │    │  直接响应  │   │  结束    │
        │  调用   │    │  用户     │   │  循环    │
        └────┬────┘    └─────┬─────┘   └──────────┘
             │               │
             ▼               ▼
┌────────────────────┐  ┌────────────────────────────────┐
│ 4. Execute Tool    │  │ 7. Generate Final Response    │
│    (执行工具)       │  │    (生成最终响应)               │
│  - 调用外部 API    │  │  - 整合所有信息                 │
│  - 获取结果        │  │  - 生成自然语言回答             │
│  - 观察执行结果    │  │  - 返回给用户                   │
└────────────────────┘  └────────────────────────────────┘
             │                     ▲
             │                     │
             └──────────┬──────────┘
                        │
                        ▼
              ┌─────────────────┐
              │ 5. Observe Result│
              │   (观察结果)      │
              └─────────────────┘
                        │
                        ▼
              ┌─────────────────┐
              │ 6. Store Memory │
              │   (存储记忆)     │
              └─────────────────┘
                        │
                        └──────────┐
                                   │
                                   ▼
                        ┌─────────────────┐
                        │ 3. LLM Reasoning │
                        │   (回到步骤3)    │
                        └─────────────────┘

3.3 关键步骤详解

步骤 1：接收用户输入

// 典型的用户输入处理
interface UserInput {
  message: string;          // 用户消息
  sessionId: string;        // 会话 ID
  userId?: string;          // 用户 ID
  metadata?: Record<string, any>;  // 附加元数据
}

步骤 2：获取上下文

// 上下文检索伪代码
async function retrieveContext(input: UserInput) {
  // 1. 获取短期记忆（最近对话）
  const recentMessages = await memory.getRecent(input.sessionId, 10);

  // 2. 检索长期记忆（知识库）
  const relevantKnowledge = await knowledgeBase.retrieve(
    input.message,
    { topK: 5 }
  );

  // 3. 合并上下文
  return {
    messages: recentMessages,
    knowledge: relevantKnowledge,
  };
}

步骤 3：LLM 推理与决策

这是 Agent 的核心，LLM 需要：

理解任务：解析用户想要什么
判断是否需要工具：是否需要调用外部能力
选择工具：从可用工具列表中选择最合适的
生成参数：构造工具调用的参数

// 工具调用决策示例
const tools = [
  {
    name: 'web_search',
    description: '搜索互联网获取最新信息',
    inputSchema: z.object({
      query: z.string().describe('搜索关键词'),
    }),
  },
  {
    name: 'calculate',
    description: '执行数学计算',
    inputSchema: z.object({
      expression: z.string().describe('数学表达式'),
    }),
  },
];

// LLM 决定调用哪个工具
const result = await generateText({
  model: 'anthropic/claude-sonnet-4.6',
  messages: [...context, { role: 'user', content: input.message }],
  tools: tools,  // 注入工具定义
});

步骤 4：执行工具

// 工具执行器
async function executeTool(toolCall: ToolCall) {
  const { name, args } = toolCall;

  switch (name) {
    case 'web_search':
      return await webSearch(args.query);
    case 'calculate':
      return await evaluateMath(args.expression);
    default:
      throw new Error(`Unknown tool: ${name}`);
  }
}

步骤 5：观察结果

工具执行完成后，需要将结果反馈给 LLM 进行下一轮推理：

// 将工具结果加入上下文
const toolResult = await executeTool(toolCall);
const updatedMessages = [
  ...messages,
  { role: 'assistant', content: toolCallText },
  { role: 'tool', toolCallId, content: JSON.stringify(toolResult) },
];

步骤 6：存储记忆

// 记忆存储
async function storeMemory(sessionId: string, data: MemoryData) {
  // 短期记忆：存入 Redis 或内存
  await shortTermMemory.push(sessionId, data);

  // 长期记忆：向量数据库存储（用于后续检索）
  if (data.important) {
    await longTermMemory.embed(sessionId, data);
  }
}

四、主流 Agent 框架对比

4.1 Vercel AI SDK (推荐)

特点：

现代化 TypeScript 设计
流式响应原生支持
与 Vercel 平台深度集成
AI Gateway 统一路由

适用场景：Web 应用、Next.js 项目、需要快速原型开发

import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

// 定义工具
const weatherTool = tool({
  description: '获取指定城市的天气信息',
  parameters: z.object({
    city: z.string().describe('城市名称'),
  }),
  execute: async ({ city }) => {
    const response = await fetch(`https://api.weather.com/?city=${city}`);
    return response.json();
  },
});

// 使用 Agent
const result = await generateText({
  model: anthropic('claude-sonnet-4-6-20250514'),
  messages: [{ role: 'user', content: '北京天气怎么样？' }],
  tools: [weatherTool],
});

4.2 LangChain / LangGraph

特点：

Python 生态主导
丰富的预置组件
LangGraph 支持有状态工作流
社区活跃、文档完善

适用场景：Python 项目、复杂工作流、需要高度定制

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

# 创建 Agent
agent = create_react_agent(
    llm=ChatOpenAI(model="gpt-5"),
    tools=[web_search, calculator],
)

# 执行
result = agent.invoke({
    "messages": [("user", "北京天气怎么样？")]
})

4.3 AutoGen (Microsoft)

特点：

多 Agent 协作
强大的代码执行能力
企业级特性支持
微软生态集成

适用场景：企业应用、多 Agent 协作场景、需要代码执行

4.4 框架对比表

特性	AI SDK	LangChain	AutoGen
语言	TypeScript	Python	Python
流式响应	原生	支持	支持
多 Agent	支持	支持	优秀
工具生态	丰富	非常丰富	丰富
学习曲线	低	中	中高
生产级	优秀	优秀	优秀

五、Tool Calling（工具调用）深度解析

5.1 Tool Calling 的工作原理

Tool Calling 是 Agent 系统的核心技术，其本质是：

工具定义：告诉 LLM 有哪些可用工具及其用途
参数生成：LLM 根据用户意图生成调用参数
执行反馈：执行工具并将结果返回给 LLM
结果整合：LLM 整合工具结果生成最终响应

5.2 工具定义规范

现代 Agent 框架使用 JSON Schema 定义工具：

const tools = [
  {
    name: 'search_hotels',
    description: '搜索指定城市的酒店信息',
    parameters: {
      type: 'object',
      properties: {
        city: {
          type: 'string',
          description: '城市名称，如北京、上海',
        },
        checkInDate: {
          type: 'string',
          description: '入住日期，格式 YYYY-MM-DD',
        },
        checkOutDate: {
          type: 'string',
          description: '退房日期，格式 YYYY-MM-DD',
        },
        guests: {
          type: 'integer',
          description: '入住人数',
          minimum: 1,
          maximum: 10,
        },
      },
      required: ['city', 'checkInDate', 'checkOutDate'],
    },
  },
];

5.3 工具执行策略

同步执行：等待工具执行完成后再继续

// 顺序执行
for (const toolCall of toolCalls) {
  const result = await executeTool(toolCall);
  messages.push({ role: 'tool', content: result });
}

并行执行：同时执行多个独立的工具调用

// 并行执行
const results = await Promise.all(
  toolCalls.map(executeTool)
);

条件执行：根据前置工具结果决定是否执行

// 条件链
const searchResult = await executeTool(searchTool);
if (searchResult.needsMoreDetails) {
  const detailResult = await executeTool(detailTool);
  // 合并结果
}

六、Memory（记忆系统）设计

6.1 记忆层次结构

┌─────────────────────────────────────────────┐
│              Long-term Memory               │
│         (长期记忆 - 向量数据库)              │
│   - 用户偏好 - 历史交互 - 知识库            │
└─────────────────────────────────────────────┘
                      ▲
                      │ 检索
                      ▼
┌─────────────────────────────────────────────┐
│             Working Memory                  │
│          (工作记忆 - 当前会话)              │
│   - 当前对话 - 工具结果 - 临时变量          │
└─────────────────────────────────────────────┘
                      ▲
                      │ 更新
                      ▼
┌─────────────────────────────────────────────┐
│              Short-term Memory              │
│         (短期记忆 - Redis/内存)             │
│   - 最近 N 轮对话 - 会话状态                │
└─────────────────────────────────────────────┘

6.2 记忆检索策略

1. 最近邻检索（Similarity Search）

// 向量相似度检索
const results = await vectorStore.similaritySearch(
  queryEmbedding,  // 用户问题的向量
  { topK: 5 }      // 返回最相似的 5 条
);

2. 时间衰减检索

// 越近的记忆权重越高
const weightedScore = similarityScore * recencyWeight;

3. 重要性加权

// 标记重要信息提高权重
if (message.isImportant) {
  score *= 1.5;
}

七、实战：构建一个 AI Agent

7.1 项目初始化

# 使用 Vercel AI SDK
npm install ai @ai-sdk/react @ai-sdk/anthropic

7.2 完整代码示例

import { generateText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { z } from 'zod';

// ========== 1. 定义工具 ==========
const searchTool = tool({
  description: '搜索互联网获取最新信息',
  parameters: z.object({
    query: z.string().describe('搜索关键词'),
  }),
  execute: async ({ query }) => {
    const response = await fetch(
      `https://api.search.com?q=${encodeURIComponent(query)}`
    );
    return response.json();
  },
});

const calculatorTool = tool({
  description: '执行数学计算',
  parameters: z.object({
    expression: z.string().describe('数学表达式'),
  }),
  execute: async ({ expression }) => {
    // 安全计算（实际使用需更严格的防护）
    const result = Function(`"use strict"; return (${expression})`)();
    return { result };
  },
});

// ========== 2. 创建 Agent ==========
class SimpleAgent {
  private model: any;
  private tools: any[];
  private memory: Message[] = [];

  constructor() {
    this.model = anthropic('claude-sonnet-4-6-20250514');
    this.tools = [searchTool, calculatorTool];
  }

  // ========== 3. 处理用户输入 ==========
  async chat(userMessage: string) {
    // 添加用户消息到记忆
    this.memory.push({ role: 'user', content: userMessage });

    // 调用 LLM
    const result = await generateText({
      model: this.model,
      messages: this.memory,
      tools: this.tools,
      maxSteps: 10,  // 最多 10 步推理
    });

    // 处理工具调用
    for (const toolCall of result.toolCalls) {
      const toolResult = await toolCall.execute();

      // 添加工具调用和结果到记忆
      this.memory.push({
        role: 'assistant',
        content: `[Tool Call: ${toolCall.toolName}]`,
      });
      this.memory.push({
        role: 'tool',
        toolCallId: toolCall.toolCallId,
        content: JSON.stringify(toolResult),
      });
    }

    // 如果有新的 LLM 响应，继续处理
    if (result.toolCalls && result.toolCalls.length > 0) {
      const continuedResult = await generateText({
        model: this.model,
        messages: this.memory,
      });
      this.memory.push({
        role: 'assistant',
        content: continuedResult.text,
      });
      return continuedResult.text;
    }

    // 直接返回文本响应
    const response = result.text;
    this.memory.push({ role: 'assistant', content: response });
    return response;
  }

  // ========== 4. 记忆管理 ==========
  clearMemory() {
    this.memory = [];
  }

  getHistory() {
    return this.memory;
  }
}

// ========== 5. 使用 Agent ==========
const agent = new SimpleAgent();
const response = await agent.chat('北京现在的天气怎么样？');
console.log(response);

八、最佳实践与优化

8.1 工具设计原则

单一职责：每个工具只做一件事
清晰描述：工具描述要准确、简洁
参数精简：只暴露必要参数，使用默认值
错误处理：工具要返回有意义的错误信息

8.2 提示词优化

// ❌ 糟糕的提示词
const prompt = "帮我查一下";

// ✅ 好的提示词
const prompt = `你是一个专业的旅行助手。请根据用户的以下信息提供帮助：
- 目的地偏好
- 出行日期
- 预算范围
- 特殊需求（如无障碍设施、宠物同行等）

请主动询问缺失信息，并在获取所有必要信息后再调用工具。`;

8.3 性能优化

流式响应：使用流式输出提升用户体验
工具并行：独立工具同时执行
缓存记忆：避免重复检索
限流保护：防止恶意请求

8.4 安全考虑

工具权限：最小化工具权限
输入验证：严格验证用户输入
执行超时：设置工具执行超时
审计日志：记录所有工具调用

九、总结与展望

9.1 核心要点回顾

AI Agent = LLM + Planning + Tools + Memory
核心流程：接收输入 → 推理决策 → 工具执行 → 观察结果 → 迭代优化
工具调用：通过 JSON Schema 定义，LLM 自主决策调用
记忆系统：短期记忆 + 长期记忆 + 向量检索
多轮循环：ReAct 模式实现持续迭代优化

9.2 未来趋势

多 Agent 协作：多个专业 Agent 协同工作
持久化状态：Agent 可以在会话间保持状态
自主学习：从交互中持续学习和优化
多模态融合：处理文本、图像、音频、视频等多种输入
边缘计算：在浏览器端运行轻量级 Agent

9.3 学习资源

本文基于 2025-2026 年主流 AI Agent 技术编写，如有更新请以官方文档为准。