AI Agent学习- LLM基础学习目标 LLM是如何工作的 Token Context Window LLM API

学习目标

LLM是如何工作的
Token
Context Window
LLM API调用
Temperature/Top_p
Prompt设计
Function Calling
Hallucination

一、LLM如何工作

LLM（Large Language Model）本质上是： 一个根据上下文预测下一个 token 的概率模型
LLM的核心流程

输入prompt
    ↓
tokenization
    ↓
Transformer推理
    ↓
预测next token概率
    ↓
采样
    ↓
生成token
    ↓
循环

二、Token

Token 是 LLM 处理文本的 最小单位。不是字符，也不是单词。为什么token重要？

API按token收费
context window 限制 token 例如：

模型	context
GPT-4	8k
GPT-4o	128k
Claude 3	200k

如果超过限制，就会

context length exceeded

三、Context Window

context window 指： 模型一次能看到的最大 token 数，如果超出，模型会截断历史。包括：

system prompt
user prompt
assistant history
tool output

实际开发会遇到：

context explosion

解决方案：

限制history
summary history
RAG

四、LLM API 调用

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
})

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain JavaScript closure." }
  ]
})

console.log(response.choices[0].message.content)

message结构

LLM 实际上是一个对话模型

messages = [
  system
  user
  assistant
]
// 例如
system
你是JS专家

user
什么是闭包

assistant
闭包是...

五、Temperature

控制输出的随机性，0-1。
接近0表示固定、稳定可复现，适合写代码。接近1表示随机性大，适合创作写作。

六、Top_p

Top_p叫做nucleus sampling（核采样），控制：从概率前 p 的 token 中选择 规则是：从概率最高的 token 开始累加，直到累计概率 ≥ top_p，就停止，并只在这些 token 里随机选择。不是简单“概率大于 top_p 的 token”，而是 累计概率。
假设模型预测下一个 token 的概率：

token	probability
Paris	0.6
NewYork	0.2
London	0.1
Berlin	0.05
Tokyo	0.05
如果限制top_p = 0.9,那就会保留前三个，然后随机选择。

七、Prompt设计

Prompt=给模型的指令差的prompt：

Explain closure

好的Prompt：

Explain JavaScript closure.

Requirements:
1. Use simple language
2. Give code examples
3. Compare with normal functions

Agent 常见 prompt 模式

role prompt

You are a senior Node.js engineer.

step-by-step

Think step by step.

output format

Return JSON:
{
 "title": "",
 "summary": ""
}

示例：

const prompt = `
You are a senior JavaScript engineer.

Explain JavaScript closure.

Requirements:
- simple explanation
- provide code example
`

八、Function Calling

Function calling 是 AI Agent 的核心能力。它允许LLM决定是否调用工具。

// 定义工具
const tools = [
  {
    type: "function",
    function: {
      name: "getWeather",
      description: "Get weather by city",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string" }
        }
      }
    }
  }
]
// 调用
const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [
    { role: "user", content: "北京天气如何？" }
  ],
  tools
})

tool_call:
getWeather({city:"北京"})

然后再把结果喂给模型，也就是 把 tool 的执行结果放进 messages 再调用一次 LLM

User → LLM
        ↓
   Tool Call
        ↓
   Execute Tool
        ↓
   Tool Result
        ↓
       LLM
        ↓
    Final Answer

完整代码示例

const messages = [
  { role: "user", content: "北京天气怎么样" }
]

const res = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages,
  tools
})

const toolCall = res.choices[0].message.tool_calls[0]

const weather = getWeather(toolCall.arguments.city)

messages.push(res.choices[0].message)

messages.push({
  role: "tool",
  tool_name: "getWeather",
  content: weather
})

const final = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages
})

console.log(final.choices[0].message.content)

LLM自己不会执行函数，他只能决定调用哪个工具。几乎所有 Agent 框架（LangChain / OpenAI Agents / AutoGPT / Claude Tools）本质都在做同一件事：

自动循环执行这个流程

while (tool_call) {
  执行工具
  把结果发回LLM
}

九、Hallucination（幻觉）

幻觉就是指模型编造事实。因为模型只是概率生成文本。常见的有编造API、编造链接、编造数据。解决方法：

RAG
工具调用（搜索、数据库、API）
限制回答（告诉模型，如果你不知道就直接说不知道）