基于大语言模型的AI Agents—Part 1代理（Agent）指能自主感知环境并采取行动实现目标的智能体。基于大语

代理（Agent） 指能自主感知环境并采取行动实现目标的智能体。基于大语言模型（LLM）的 AI Agent 利用 LLM 进行记忆检索、决策推理和行动顺序选择等，把Agent的智能程度提升到了新的高度。LLM驱动的Agent具体是怎么做的呢？接下来的系列分享会介绍 AI Agent 当前最新的技术进展。

什么是AI Agent？

代理（Agent） 这个词来源于拉丁语“agere”，意为“行动”。现在可以表示在各个领域能够独立思考和行动的人或事物的概念。它强调自主性和主动性。智能代理/智能体是以智能方式行事的代理；Agent感知环境，自主采取行动以实现目标，并可以通过学习或获取知识来提高其性能。

可以把单个Agent看成是某个方面的专家。

一个精简的Agent决策流程：

Agent：P（感知）→ P（规划）→ A（行动）

感知（Perception） 是指Agent从环境中收集信息并从中提取相关知识的能力。

规划（Planning） 是指Agent为了某一目标而作出的决策过程。

行动（Action） 是指基于环境和规划做出的动作。

其中，Policy是Agent做出Action的核心决策，而行动又通过观察（Observation） 成为进一步Perception的前提和基础，形成自主地闭环学习过程。

类 LangChain 中的各种概念 ：

Models，也就是我们熟悉的调用大模型API。

Prompt Templates，在提示词中引入变量以适应用户输入的提示模版。

Chains，对模型的链式调用，以上一个输出为下一个输入的一部分。

Agent，能自主执行链式调用，以及访问外部工具。

Multi-Agent，多个Agent共享一部分记忆，自主分工相互协作。

LangChain 中 Agent 和 Chain 的区别：

The core idea of agents is to use an LLM to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

背景知识

做决策的过程中，一个很重要的信息来源是 记忆（Memory） 。作为重要的背景知识，下面简单介绍下都有哪些种类的记忆。

记忆（Memory）

记忆可以被定义为获取、储存、保留以及后来检索信息的过程。人脑中有几种类型的记忆。

感觉记忆（Sensory Memory） ：这是记忆的最早阶段，提供在原始刺激结束后保留感官信息（视觉、听觉等）的印象的能力。感觉记忆通常只持续几秒钟。子类别包括视觉记忆（iconic memory）、回声记忆（echoic memory）和触觉记忆（haptic memory）。

短期记忆（Short-Term Memory, STM） 或工作记忆（Working Memory）：它储存我们当前意识到的信息，以执行复杂的认知任务，如学习和推理。短期记忆被认为有大约7个项目的容量（Miller 1956）并持续20-30秒。

长期记忆（Long-Term Memory, LTM） ：长期记忆可以储存信息很长一段时间，从几天到几十年，其储存容量基本上是无限的。LTM有两个子类型：
- 显性 / 陈述记忆（Explicit / declarative memory） ：这是对事实和事件的记忆，指的是那些可以被有意识地回忆的记忆，包括情景记忆（事件和经验）和语义记忆（事实和概念）。
- 隐性 / 程序记忆（Implicit / procedural memory） ：这种记忆是无意识的，涉及自动执行的技能和例行程序，如骑自行车或在键盘上打字。

可以大致考虑以下对应关系：

将感觉记忆视为学习原始输入（包括文本、图像或其他模式）的嵌入表示；

将短期记忆视为在上下文中（prompt）学习。它是短暂且有限的，因为它受到Transformer的上下文窗口长度的限制。

将长期记忆视为代理在查询时可以注意到的外部向量存储，可以通过快速检索访问。

怎么写好Prompt：ReAct

Home: react-lm.github.io/

LangChain中的ReAct：python.langchain.com/docs/module…

ReAct 指：Reason and Act 。

特色：

CoT 只是在prompt加入了静态的 “Let’s think step by step”。ReAct 的prompt是动态变化的。

CoT 只调用LLM一次即可，ReAct是多次迭代调用LLM。

ReAct 可能是当前Agent中使用最多的prompt结构：少样本 + Thought, Action, Observation 。也是调用工具、推理和规划时常用的prompt结构。🔥🔥🔥

ReAct 中迭代使用3个元素：Thought, Action, Observation。其中 Thought, Action 由 LLM 生成，Observation ****是执行 Action 后获得的返回结果。

Step 1 中，LLM基于 Question 先think（reasoning），然后再决定采取什么行动。这样LLM就会生成 Thought 1 和 Action 1 。执行 Action 1 获得 Observation 1。

Step 2 中，LLM基于 Question，Thought 1 ，Action 1 和 ****Observation 1，汇总所有信息先think（reasoning），然后再决定采取什么行动。这样LLM就会生成 Thought 2 和 Action 2 。执行 Action 2 获得 Observation 2。

Step 3 中，LLM基于 Question，Thought 1 ，Action 1 ，Observation 1，Thought 2 ，Action 2 和 Observation 2，汇总所有信息先think（reasoning），然后再决定采取什么行动。这样LLM就会生成 Thought 3 和 Action 3 。执行 Action 3 获得 Observation 3。

以此类推直到 Action 表示结束。

具体代码可以参考以下：

def webthink(idx=None, prompt=webthink_prompt, to_print=True):
    question = env.reset(idx=idx)
    if to_print:
        print(idx, question)
    prompt += question + "\n"
    n_calls, n_badcalls = 0, 0
    for i in range(1, 8):
        n_calls += 1
        thought_action = llm(prompt + f"Thought {i}:", stop=[f"\nObservation {i}:"])
        try:
            thought, action = thought_action.strip().split(f"\nAction {i}: ")
        except:
            print('ohh...', thought_action)
            n_badcalls += 1
            n_calls += 1
            thought = thought_action.strip().split('\n')[0]
            action = llm(prompt + f"Thought {i}: {thought}\nAction {i}:", stop=[f"\n"]).strip()
        obs, r, done, info = step(env, action[0].lower() + action[1:])
        obs = obs.replace('\\n', '')
        step_str = f"Thought {i}: {thought}\nAction {i}: {action}\nObservation {i}: {obs}\n"
        prompt += step_str  # 之前的 Thought, Action, Observation 都加进来了
        if to_print:
            print(step_str)
        if done:
            break
    if not done:
        obs, r, done, info = step(env, "finish[]")
    if to_print:
        print(info, '\n')
    info.update({'n_calls': n_calls, 'n_badcalls': n_badcalls, 'traj': prompt})
    return r, info

https://github.com/ysymyth/ReAct/blob/master/hotpotqa.ipynb

上面函数输入的参数 webthink_prompt ，长如下样子：

Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action can be three types: 
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
Here are some examples.

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]

[More Examples]...