🎯 别再乱试提示词了!这份系统指南请收好

261 阅读1小时+

📖 摘要 提示工程是AI时代的核心技能。这篇文章系统梳理了从基础到高级的9大提示技术,包括零样本提示、思维链、ReAct模式等实用方法。无论你是开发者还是AI爱好者,都能在这里找到提升AI交互质量的完整方案。


💎 提示工程完整指南

让AI乖乖听话的9个实用技巧


🌟 开篇故事

小李第一次用GPT写代码时,提示词写得乱七八糟。AI给出的答案总是离题万里,让他很沮丧。

后来他学会了结构化提示,同样的问题,AI瞬间就能给出完美答案。这个转变,只花了不到一周时间。


📌 1. 核心原则:让AI听懂你在说什么

清晰性是第一位。别绕弯子,直接告诉AI你要什么。

简洁性很重要。用简单的动词,比如"总结"、"分类"、"提取"。

实验精神必须有。好提示都是试出来的,一次不行就改。

关键提醒:先想清楚自己要什么,再动手写提示词


📌 2. 基础技术:从零开始也能用

零样本最直接。直接说"请总结这篇文章",AI就能开工。

单样本给个模板。提供一个例子,AI就明白你要的格式。

少样本最稳妥。给3-5个例子,复杂任务也能搞定。

💡 实战技巧:分类任务用少样本,简单任务用零样本


📌 3. 结构化方法:让提示词更有条理

系统提示设定全局规则。告诉AI"你是一个专业的数据分析师"。

角色扮演让回答更专业。让AI扮演"资深程序员"或"营销专家"。

分隔符理清层次。用```或者---把不同部分分开。

结构化输出方便后续处理。要求JSON或XML格式,程序好解析。

⚠️ 注意:角色设定要具体,别只说"你是专家"


📌 4. 上下文工程:给AI足够的背景信息

上下文质量比模型架构更重要。好背景胜过好模型。

动态上下文让AI更智能。结合对话历史,AI理解更深入。

外部数据扩展AI能力。接入实时信息,减少幻觉问题。

🔍 核心洞察:上下文就是AI的"工作记忆"


📌 5. 推理技术:让AI像人一样思考

思维链要求逐步推理。"请一步一步分析这个问题"

自我一致性提高准确率。多问几次,选出现次数最多的答案。

退一步思考先想原则再解题。"先说说解决这类问题的通用方法"

思维树探索多种可能。像下棋一样,考虑不同走法。

🎯 应用场景:数学题用思维链,创意任务用思维树


📌 6. 行动技术:让AI动手干活

工具调用扩展AI能力。让AI用计算器、查资料、调接口。

ReAct模式结合思考和行动。"思考-行动-观察"循环解决问题。

💻 实际案例:ReAct模式特别适合复杂任务,比如数据分析


📌 7. 高级技术:专家级玩法

自动提示工程用AI优化AI。让大模型帮你写更好的提示词。

检索增强生成接入知识库。RAG技术减少幻觉,提高准确性。

人物模式个性化回答。根据用户水平调整回答复杂度。

🚀 进阶建议:先掌握基础技术,再尝试高级玩法


📌 8. 特定任务:不同场景不同策略

代码生成要具体。明确语言、框架、功能要求。

多模态任务描述清楚。图像分析要说清楚看哪里、关注什么。

数据分析给足背景。说明数据格式、分析目标、输出要求。

📊 经验之谈:任务越具体,AI表现越好


📌 9. 最佳实践:少走弯路

迭代测试是必须的。第一次不完美很正常,多改几次。

文档记录很重要。把好用的提示词存下来,建立自己的库。

验证输出要严谨。特别是结构化输出,要检查格式是否正确。

版本管理有好处。提示词也要像代码一样管理版本。

📝 实用建议:建个文档,专门收集和整理提示词


🌟 写在最后

提示工程不是玄学,是技术。掌握这9个技巧,你就能把AI从"智障"变成"智能"。

关键要记住:AI不是万能的,但是用对了提示词,它真的很强大。

从今天开始,别再说"AI不好用"。先检查自己的提示词,是不是写对了?


💬 互动时间

你在用AI时遇到过什么问题?哪个提示技巧对你最有用?

欢迎留言分享你的经验和疑问,一起交流进步!


🎯 下期预告

下一期我们将分享10个实用提示词模板,涵盖写作、编程、分析等常见场景。

关注我们不迷路,下期内容更精彩!


Appendix A: Advanced Prompting Techniques

附录A:高级提示工程技术

Introduction to Prompting

提示工程导论

Prompting, the primary interface for interacting with language models, is the process of crafting inputs to guide the model towards generating a desired output. This involves structuring requests, providing relevant context, specifying the output format, and demonstrating expected response types. Well-designed prompts can maximize the potential of language models, resulting in accurate, relevant, and creative responses. In contrast, poorly designed prompts can lead to ambiguous, irrelevant, or erroneous outputs.

提示工程是与语言模型交互的主要接口,是通过精心设计输入来引导模型生成期望输出的过程。这包括构建请求结构、提供相关上下文、指定输出格式以及展示期望的响应类型。精心设计的提示可以最大化语言模型的潜力,产生准确、相关且富有创意的响应。相反,设计不当的提示可能导致模糊、不相关或错误的输出。

The objective of prompt engineering is to consistently elicit high-quality responses from language models. This requires understanding the capabilities and limitations of the models and effectively communicating intended goals. It involves developing expertise in communicating with AI by learning how to best instruct it.

提示工程的目标是从语言模型中持续获得高质量的响应。这需要理解模型的能力和局限性,并有效传达预期目标。它涉及通过学习如何最佳地指导AI来发展与AI沟通的专业知识。

This appendix details various prompting techniques that extend beyond basic interaction methods. It explores methodologies for structuring complex requests, enhancing the model's reasoning abilities, controlling output formats, and integrating external information. These techniques are applicable to building a range of applications, from simple chatbots to complex multi-agent systems, and can improve the performance and reliability of agentic applications.

本附录详细介绍了超越基本交互方法的各种提示技术。它探讨了构建复杂请求、增强模型推理能力、控制输出格式以及集成外部信息的方法论。这些技术适用于构建从简单聊天机器人到复杂多智能体系统的各种应用,并能提高智能体应用的性能和可靠性。

Agentic patterns, the architectural structures for building intelligent systems, are detailed in the main chapters. These patterns define how agents plan, utilize tools, manage memory, and collaborate. The efficacy of these agentic systems is contingent upon their ability to interact meaningfully with language models.

智能体模式是构建智能系统的架构结构,在主要章节中有详细说明。这些模式定义了智能体如何规划、使用工具、管理记忆和协作。这些智能体系统的有效性取决于它们与语言模型进行有意义交互的能力。

Core Prompting Principles

核心提示原则

Core Principles for Effective Prompting of Language Models: 语言模型有效提示的核心原则:

Effective prompting rests on fundamental principles guiding communication with language models, applicable across various models and task complexities. Mastering these principles is essential for consistently generating useful and accurate responses.

有效的提示基于指导与语言模型沟通的基本原则,适用于各种模型和任务复杂性。掌握这些原则对于持续生成有用且准确的响应至关重要。

Clarity and Specificity: Instructions should be unambiguous and precise. Language models interpret patterns; multiple interpretations may lead to unintended responses. Define the task, desired output format, and any limitations or requirements. Avoid vague language or assumptions. Inadequate prompts yield ambiguous and inaccurate responses, hindering meaningful output.

清晰性和具体性:指令应该明确且精确。语言模型解释模式;多重解释可能导致意外的响应。定义任务、期望的输出格式以及任何限制或要求。避免模糊的语言或假设。不充分的提示会产生模糊和不准确的响应,阻碍有意义的输出。

Conciseness: While specificity is crucial, it should not compromise conciseness. Instructions should be direct. Unnecessary wording or complex sentence structures can confuse the model or obscure the primary instruction. Prompts should be simple; what is confusing to the user is likely confusing to the model. Avoid intricate language and superfluous information. Use direct phrasing and active verbs to clearly delineate the desired action. Effective verbs include: Act, Analyze, Categorize, Classify, Contrast, Compare, Create, Describe, Define, Evaluate, Extract, Find, Generate, Identify, List, Measure, Organize, Parse, Pick, Predict, Provide, Rank, Recommend, Return, Retrieve, Rewrite, Select, Show, Sort, Summarize, Translate, Write.

简洁性:虽然具体性至关重要,但不应损害简洁性。指令应该直接。不必要的措辞或复杂的句子结构可能混淆模型或掩盖主要指令。提示应该简单;让用户困惑的内容很可能也会让模型困惑。避免复杂的语言和多余的信息。使用直接的措辞和主动动词来清晰界定期望的操作。有效的动词包括:行动、分析、分类、对比、比较、创建、描述、定义、评估、提取、查找、生成、识别、列出、测量、组织、解析、选择、预测、提供、排名、推荐、返回、检索、重写、选择、显示、排序、总结、翻译、写作。

Using Verbs: Verb choice is a key prompting tool. Action verbs indicate the expected operation. Instead of "Think about summarizing this," a direct instruction like "Summarize the following text" is more effective. Precise verbs guide the model to activate relevant training data and processes for that specific task.

使用动词:动词选择是一个关键的提示工具。动作动词指示预期的操作。与其说"思考总结这个",不如说"总结以下文本"这样的直接指令更有效。精确的动词引导模型激活相关训练数据和特定任务的处理过程。

Instructions Over Constraints: Positive instructions are generally more effective than negative constraints. Specifying the desired action is preferred to outlining what not to do. While constraints have their place for safety or strict formatting, excessive reliance can cause the model to focus on avoidance rather than the objective. Frame prompts to guide the model directly. Positive instructions align with human guidance preferences and reduce confusion.

指令优于约束:积极的指令通常比消极的约束更有效。指定期望的操作比概述不做什么更可取。虽然约束在安全或严格格式化方面有其作用,但过度依赖可能导致模型专注于避免而不是目标。构建提示以直接引导模型。积极的指令符合人类指导偏好并减少混淆。

Experimentation and Iteration: Prompt engineering is an iterative process. Identifying the most effective prompt requires multiple attempts. Begin with a draft, test it, analyze the output, identify shortcomings, and refine the prompt. Model variations, configurations (like temperature or top-p), and slight phrasing changes can yield different results. Documenting attempts is vital for learning and improvement. Experimentation and iteration are necessary to achieve the desired performance.

实验和迭代:提示工程是一个迭代过程。识别最有效的提示需要多次尝试。从草稿开始,测试它,分析输出,识别缺点,然后改进提示。模型变体、配置(如温度或top-p)以及轻微的措辞变化可能产生不同的结果。记录尝试对于学习和改进至关重要。实验和迭代是实现期望性能所必需的。

These principles form the foundation of effective communication with language models. By prioritizing clarity, conciseness, action verbs, positive instructions, and iteration, a robust framework is established for applying more advanced prompting techniques.

这些原则构成了与语言模型有效沟通的基础。通过优先考虑清晰性、简洁性、动作动词、积极指令和迭代,为应用更高级的提示技术建立了一个强大的框架。

Basic Prompting Techniques

基础提示技术

Building on core principles, foundational techniques provide language models with varying levels of information or examples to direct their responses. These methods serve as an initial phase in prompt engineering and are effective for a wide spectrum of applications.

基于核心原则,基础技术为语言模型提供不同级别的信息或示例来指导其响应。这些方法作为提示工程的初始阶段,适用于广泛的应用。

Zero-Shot Prompting

零样本提示

Zero-shot prompting is the most basic form of prompting, where the language model is provided with an instruction and input data without any examples of the desired input-output pair. It relies entirely on the model's pre-training to understand the task and generate a relevant response. Essentially, a zero-shot prompt consists of a task description and initial text to begin the process.

零样本提示是最基本的提示形式,语言模型被提供指令和输入数据,没有任何期望的输入-输出对示例。它完全依赖模型的预训练来理解任务并生成相关响应。本质上,零样本提示包括任务描述和开始过程的初始文本。

  • When to use: Zero-shot prompting is often sufficient for tasks that the model has likely encountered extensively during its training, such as simple question answering, text completion, or basic summarization of straightforward text. It's the quickest approach to try first.
  • 何时使用:零样本提示通常足以处理模型在训练期间可能广泛遇到的任务,例如简单的问题回答、文本补全或直接文本的基本总结。这是首先尝试的最快方法。
  • Example:
    Translate the following English sentence to French: 'Hello, how are you?'
  • 示例: 将以下英文句子翻译成法文:'Hello, how are you?'

One-Shot Prompting

单样本提示

One-shot prompting involves providing the language model with a single example of the input and the corresponding desired output prior to presenting the actual task. This method serves as an initial demonstration to illustrate the pattern the model is expected to replicate. The purpose is to equip the model with a concrete instance that it can use as a template to effectively execute the given task.

单样本提示涉及在实际任务呈现之前为语言模型提供一个输入和相应期望输出的单一示例。这种方法作为初始演示,说明模型期望复制的模式。目的是为模型提供一个具体实例,它可以将其用作模板来有效执行给定任务。

  • When to use: One-shot prompting is useful when the desired output format or style is specific or less common. It gives the model a concrete instance to learn from. It can improve performance compared to zero-shot for tasks requiring a particular structure or tone.

  • 何时使用:单样本提示在期望的输出格式或风格特定或不常见时很有用。它为模型提供了一个可以学习的具体实例。对于需要特定结构或语气的任务,与零样本相比可以提高性能。

  • Example:
    Translate the following English sentences to Spanish:
    English: 'Thank you.'
    Spanish: 'Gracias.'

    English: 'Please.'
    Spanish:

  • 示例: 将以下英文句子翻译成西班牙文: 英文:'Thank you.' 西班牙文:'Gracias.'

    英文:'Please.' 西班牙文:

Few-Shot Prompting

少样本提示

Few-shot prompting enhances one-shot prompting by supplying several examples, typically three to five, of input-output pairs. This aims to demonstrate a clearer pattern of expected responses, improving the likelihood that the model will replicate this pattern for new inputs. This method provides multiple examples to guide the model to follow a specific output pattern.

少样本提示通过提供多个(通常三到五个)输入-输出对示例来增强单样本提示。这旨在展示更清晰的期望响应模式,提高模型为新输入复制此模式的可能性。这种方法提供多个示例来指导模型遵循特定的输出模式。

  • When to use: Few-shot prompting is particularly effective for tasks where the desired output requires adhering to a specific format, style, or exhibiting nuanced variations. It's excellent for tasks like classification, data extraction with specific schemas, or generating text in a particular style, especially when zero-shot or one-shot don't yield consistent results. Using at least three to five examples is a general rule of thumb, adjusting based on task complexity and model token limits.

  • 何时使用:少样本提示特别适用于期望输出需要遵循特定格式、风格或展示细微变化的任务。它非常适合分类、具有特定模式的数据提取或以特定风格生成文本等任务,特别是当零样本或单样本不能产生一致结果时。使用至少三到五个示例是一个通用经验法则,根据任务复杂性和模型标记限制进行调整。

  • Importance of Example Quality and Diversity: The effectiveness of few-shot prompting heavily relies on the quality and diversity of the examples provided. Examples should be accurate, representative of the task, and cover potential variations or edge cases the model might encounter. High-quality, well-written examples are crucial; even a small mistake can confuse the model and result in undesired output. Including diverse examples helps the model generalize better to unseen inputs.

  • 示例质量和多样性的重要性:少样本提示的有效性在很大程度上依赖于所提供示例的质量和多样性。示例应准确、具有任务代表性,并涵盖模型可能遇到的潜在变化或边缘情况。高质量、精心编写的示例至关重要;即使是一个小错误也可能混淆模型并导致不期望的输出。包含多样化的示例有助于模型更好地泛化到未见过的输入。

  • Mixing Up Classes in Classification Examples: When using few-shot prompting for classification tasks (where the model needs to categorize input into predefined classes), it's a best practice to mix up the order of the examples from different classes. This prevents the model from potentially overfitting to the specific sequence of examples and ensures it learns to identify the key features of each class independently, leading to more robust and generalizable performance on unseen data.

  • 在分类示例中混合类别:当使用少样本提示进行分类任务(模型需要将输入分类到预定义类别)时,最佳实践是混合来自不同类别的示例顺序。这可以防止模型可能对特定示例序列过拟合,并确保它独立学习识别每个类别的关键特征,从而在未见数据上获得更稳健和可泛化的性能。

  • Evolution to "Many-Shot" Learning: As modern LLMs like Gemini get stronger with long context modeling, they are becoming highly effective at utilizing "many-shot" learning. This means optimal performance for complex tasks can now be achieved by including a much larger number of examples—sometimes even hundreds—directly within the prompt, allowing the model to learn more intricate patterns.

  • 向"多样本"学习的演变:随着像Gemini这样的现代LLM在长上下文建模方面变得更强大,它们在利用"多样本"学习方面变得非常有效。这意味着现在可以通过在提示中直接包含更多数量的示例(有时甚至数百个)来实现复杂任务的最佳性能,使模型能够学习更复杂的模式。

  • Example:
    Classify the sentiment of the following movie reviews as POSITIVE, NEUTRAL, or NEGATIVE:

    Review: "The acting was superb and the story was engaging."
    Sentiment: POSITIVE

    Review: "It was okay, nothing special."
    Sentiment: NEUTRAL

    Review: "I found the plot confusing and the characters unlikable."
    Sentiment: NEGATIVE

    Review: "The visuals were stunning, but the dialogue was weak."
    Sentiment:

  • 示例: 将以下电影评论的情感分类为POSITIVE(积极)、NEUTRAL(中性)或NEGATIVE(消极):

    评论:"The acting was superb and the story was engaging." 情感:POSITIVE

    评论:"It was okay, nothing special." 情感:NEUTRAL

    评论:"I found the plot confusing and the characters unlikable." 情感:NEGATIVE

    评论:"The visuals were stunning, but the dialogue was weak." 情感:

Understanding when to apply zero-shot, one-shot, and few-shot prompting techniques, and thoughtfully crafting and organizing examples, are essential for enhancing the effectiveness of agentic systems. These basic methods serve as the groundwork for various prompting strategies.

理解何时应用零样本、单样本和少样本提示技术,并精心制作和组织示例,对于增强智能体系统的有效性至关重要。这些基本方法为各种提示策略奠定了基础。

Structuring Prompts

结构化提示

Beyond the basic techniques of providing examples, the way you structure your prompt plays a critical role in guiding the language model. Structuring involves using different sections or elements within the prompt to provide distinct types of information, such as instructions, context, or examples, in a clear and organized manner. This helps the model parse the prompt correctly and understand the specific role of each piece of text.

除了提供示例的基本技术外,提示的结构方式在指导语言模型方面起着关键作用。结构化涉及在提示中使用不同的部分或元素,以清晰有序的方式提供不同类型的信息,如指令、上下文或示例。这有助于模型正确解析提示并理解每段文本的具体作用。

System Prompting

系统提示

System prompting sets the overall context and purpose for a language model, defining its intended behavior for an interaction or session. This involves providing instructions or background information that establish rules, a persona, or overall behavior. Unlike specific user queries, a system prompt provides foundational guidelines for the model's responses. It influences the model's tone, style, and general approach throughout the interaction. For example, a system prompt can instruct the model to consistently respond concisely and helpfully or ensure responses are appropriate for a general audience. System prompts are also utilized for safety and toxicity control by including guidelines such as maintaining respectful language.

系统提示为语言模型设置整体上下文和目的,定义其在交互或会话中的预期行为。这涉及提供建立规则、角色或整体行为的指令或背景信息。与特定的用户查询不同,系统提示为模型的响应提供基础指导。它影响模型在整个交互过程中的语气、风格和一般方法。例如,系统提示可以指示模型始终以简洁和有益的方式响应,或确保响应适合一般受众。系统提示还通过包含保持尊重语言等指导方针来用于安全和毒性控制。

Furthermore, to maximize their effectiveness, system prompts can undergo automatic prompt optimization through LLM-based iterative refinement. Services like the Vertex AI Prompt Optimizer facilitate this by systematically improving prompts based on user-defined metrics and target data, ensuring the highest possible performance for a given task.

此外,为了最大化其有效性,系统提示可以通过基于LLM的迭代优化进行自动提示优化。像Vertex AI Prompt Optimizer这样的服务通过基于用户定义的指标和目标数据系统地改进提示来促进这一点,确保给定任务的最高可能性能。

  • Example:
    You are a helpful and harmless AI assistant. Respond to all queries in a polite and informative manner. Do not generate content that is harmful, biased, or inappropriate
  • 示例: 你是一个有帮助且无害的AI助手。以礼貌和信息丰富的方式回应所有查询。不要生成有害、有偏见或不适当的内容。

Role Prompting

角色提示

Role prompting assigns a specific character, persona, or identity to the language model, often in conjunction with system or contextual prompting. This involves instructing the model to adopt the knowledge, tone, and communication style associated with that role. For example, prompts such as "Act as a travel guide" or "You are an expert data analyst" guide the model to reflect the perspective and expertise of that assigned role. Defining a role provides a framework for the tone, style, and focused expertise, aiming to enhance the quality and relevance of the output. The desired style within the role can also be specified, for instance, "a humorous and inspirational style."

角色提示为语言模型分配特定的角色、人物或身份,通常与系统或上下文提示结合使用。这涉及指示模型采用与该角色相关的知识、语气和沟通风格。例如,诸如"扮演旅行指南"或"你是一名专家数据分析师"的提示指导模型反映所分配角色的视角和专业知识。定义角色为语气、风格和专注专业知识提供了一个框架,旨在提高输出的质量和相关性。角色内的期望风格也可以指定,例如"幽默和鼓舞人心的风格"。

  • Example:
    Act as a seasoned travel blogger. Write a short, engaging paragraph about the best hidden gem in Rome.
  • 示例: 扮演一位经验丰富的旅行博主。写一段简短、引人入胜的段落,介绍罗马最好的隐藏宝藏。

Using Delimiters

使用分隔符

Effective prompting involves clear distinction of instructions, context, examples, and input for language models. Delimiters, such as triple backticks (```), XML tags (, ), or markers (---), can be utilized to visually and programmatically separate these sections. This practice, widely used in prompt engineering, minimizes misinterpretation by the model, ensuring clarity regarding the role of each part of the prompt.

有效的提示涉及清晰区分语言模型的指令、上下文、示例和输入。分隔符,如三重反引号(```)、XML标签(、)或标记(---),可用于在视觉上和编程上分离这些部分。这种在提示工程中广泛使用的实践最大限度地减少了模型的误解,确保每个提示部分的作用清晰。

  • Example:
    Summarize the following article, focusing on the main arguments presented by the author. [Insert the full text of the article here]
  • 示例: 总结以下文章,重点关注作者提出的主要论点。 [在此插入文章全文]

Contextual Engineering

上下文工程

Context engineering, unlike static system prompts, dynamically provides background information crucial for tasks and conversations. This ever-changing information helps models grasp nuances, recall past interactions, and integrate relevant details, leading to grounded responses and smoother exchanges. Examples include previous dialogue, relevant documents (as in Retrieval Augmented Generation), or specific operational parameters. For instance, when discussing a trip to Japan, one might ask for three family-friendly activities in Tokyo, leveraging the existing conversational context. In agentic systems, context engineering is fundamental to core agent behaviors like memory persistence, decision-making, and coordination across sub-tasks. Agents with dynamic contextual pipelines can sustain goals over time, adapt strategies, and collaborate seamlessly with other agents or tools—qualities essential for long-term autonomy. This methodology posits that the quality of a model's output depends more on the richness of the provided context than on the model's architecture. It signifies a significant evolution from traditional prompt engineering, which primarily focused on optimizing the phrasing of immediate user queries. Context engineering expands its scope to include multiple layers of information.

上下文工程与静态系统提示不同,它动态地提供对任务和对话至关重要的背景信息。这种不断变化的信息帮助模型掌握细微差别,回忆过去的互动,并整合相关细节,从而产生有根据的响应和更流畅的交流。示例包括先前的对话、相关文档(如检索增强生成中的文档)或特定的操作参数。例如,在讨论日本之旅时,可以利用现有的对话上下文询问东京的三个适合家庭的活动。在智能体系统中,上下文工程是核心智能体行为(如记忆持久性、决策制定和跨子任务协调)的基础。具有动态上下文管道的智能体可以随着时间的推移维持目标,调整策略,并与其他智能体或工具无缝协作——这些品质对于长期自主性至关重要。这种方法论认为,模型输出的质量更多地取决于提供的上下文的丰富程度,而不是模型的架构。它标志着从传统提示工程的重大演变,传统提示工程主要侧重于优化即时用户查询的措辞。上下文工程将其范围扩展到包括多个信息层。

These layers include:

这些层包括:

  • System prompts: Foundational instructions that define the AI's operational parameters (e.g., "You are a technical writer; your tone must be formal and precise").

  • 系统提示:定义AI操作参数的基础指令(例如,"你是一名技术作家;你的语气必须正式且精确")。

  • External data:

    • Retrieved documents: Information actively fetched from a knowledge base to inform responses (e.g., pulling technical specifications).
    • Tool outputs: Results from the AI using an external API for real-time data (e.g., querying a calendar for availability).
  • 外部数据

    • 检索文档:从知识库中主动获取以告知响应的信息(例如,提取技术规范)。
    • 工具输出:AI使用外部API获取实时数据的结果(例如,查询日历可用性)。
  • Implicit data: Critical information such as user identity, interaction history, and environmental state. Incorporating implicit context presents challenges related to privacy and ethical data management. Therefore, robust governance is essential for context engineering, especially in sectors like enterprise, healthcare, and finance.

  • 隐式数据:关键信息,如用户身份、交互历史和环境状态。纳入隐式上下文带来了与隐私和道德数据管理相关的挑战。因此,强大的治理对于上下文工程至关重要,特别是在企业、医疗保健和金融等行业。

The core principle is that even advanced models underperform with a limited or poorly constructed view of their operational environment. This process reframes the task from merely answering a question to building a comprehensive operational picture for the agent. For example, a context-engineered agent would integrate a user's calendar availability (tool output), the professional relationship with an email recipient (implicit data), and notes from previous meetings (retrieved documents) before responding to a query. This enables the model to generate highly relevant, personalized, and pragmatically useful outputs. The "engineering" aspect involves creating robust pipelines to fetch and transform this data at runtime and establishing feedback loops to continually improve context quality.

核心原则是,即使先进的模型在其操作环境的视图有限或构建不良时也会表现不佳。这个过程将任务从仅仅回答问题重新定义为为智能体构建全面的操作图。例如,一个经过上下文工程的智能体在响应查询之前会整合用户的日历可用性(工具输出)、与电子邮件收件人的专业关系(隐式数据)以及先前会议的笔记(检索文档)。这使得模型能够生成高度相关、个性化和实用性的输出。"工程"方面涉及创建强大的管道以在运行时获取和转换这些数据,并建立反馈循环以持续改进上下文质量。

To implement this, specialized tuning systems, such as Google's Vertex AI prompt optimizer, can automate the improvement process at scale. By systematically evaluating responses against sample inputs and predefined metrics, these tools can enhance model performance and adapt prompts and system instructions across different models without extensive manual rewriting. Providing an optimizer with sample prompts, system instructions, and a template allows it to programmatically refine contextual inputs, offering a structured method for implementing the necessary feedback loops for sophisticated Context Engineering.
This structured approach differentiates a rudimentary AI tool from a more sophisticated, contextually-aware system. It treats context as a primary component, emphasizing what the agent knows, when it knows it, and how it uses that information. This practice ensures the model has a well-rounded understanding of the user's intent, history, and current environment. Ultimately, Context Engineering is a crucial methodology for transforming stateless chatbots into highly capable, situationally-aware systems.

为实现这一点,专门的调优系统,如Google的Vertex AI提示优化器,可以大规模自动化改进过程。通过根据样本输入和预定义指标系统地评估响应,这些工具可以增强模型性能,并在不同模型之间调整提示和系统指令,而无需大量手动重写。为优化器提供样本提示、系统指令和模板,使其能够以编程方式优化上下文输入,为实施复杂上下文工程所需的反馈循环提供结构化方法。这种结构化方法将基本的AI工具与更复杂、上下文感知的系统区分开来。它将上下文视为主要组成部分,强调智能体知道什么、何时知道以及如何使用这些信息。这种实践确保模型对用户的意图、历史和当前环境有全面的理解。最终,上下文工程是将无状态聊天机器人转变为高度能力、情境感知系统的关键方法。

Structured Output

结构化输出

Often, the goal of prompting is not just to get a free-form text response, but to extract or generate information in a specific, machine-readable format. Requesting structured output, such as JSON, XML, CSV, or Markdown tables, is a crucial structuring technique. By explicitly asking for the output in a particular format and potentially providing a schema or example of the desired structure, you guide the model to organize its response in a way that can be easily parsed and used by other parts of your agentic system or application. Returning JSON objects for data extraction is beneficial as it forces the model to create a structure and can limit hallucinations. Experimenting with output formats is recommended, especially for non-creative tasks like extracting or categorizing data.

通常,提示的目标不仅仅是获得自由格式的文本响应,而是以特定的机器可读格式提取或生成信息。请求结构化输出,如JSON、XML、CSV或Markdown表格,是一种关键的结构化技术。通过明确要求以特定格式输出,并可能提供所需结构的模式或示例,您可以指导模型以易于解析并被智能体系统或应用程序的其他部分使用的方式组织其响应。返回JSON对象进行数据提取是有益的,因为它强制模型创建结构并可以限制幻觉。建议尝试不同的输出格式,特别是对于非创造性任务,如提取或分类数据。

  • Example:
    Extract the following information from the text below and return it as a JSON object with keys "name", "address", and "phone_number".

    Text: "Contact John Smith at 123 Main St, Anytown, CA or call (555) 123-4567."

  • 示例: 从以下文本中提取信息,并将其作为具有键"name"、"address"和"phone_number"的JSON对象返回。

    文本:"联系John Smith,地址:123 Main St, Anytown, CA,或致电(555) 123-4567。"

Effectively utilizing system prompts, role assignments, contextual information, delimiters, and structured output significantly enhances the clarity, control, and utility of interactions with language models, providing a strong foundation for developing reliable agentic systems. Requesting structured output is crucial for creating pipelines where the language model's output serves as the input for subsequent system or processing steps.

有效利用系统提示、角色分配、上下文信息、分隔符和结构化输出显著增强了与语言模型交互的清晰度、控制力和实用性,为开发可靠的智能体系统提供了坚实基础。请求结构化输出对于创建语言模型输出作为后续系统或处理步骤输入的管道至关重要。

Leveraging Pydantic for an Object-Oriented Facade: A powerful technique for enforcing structured output and enhancing interoperability is to use the LLM's generated data to populate instances of Pydantic objects. Pydantic is a Python library for data validation and settings management using Python type annotations. By defining a Pydantic model, you create a clear and enforceable schema for your desired data structure. This approach effectively provides an object-oriented facade to the prompt's output, transforming raw text or semi-structured data into validated, type-hinted Python objects.

利用Pydantic实现面向对象的外观: 一种强制执行结构化输出和增强互操作性的强大技术是使用LLM生成的数据来填充Pydantic对象的实例。Pydantic是一个使用Python类型注解进行数据验证和设置管理的Python库。通过定义Pydantic模型,您可以为所需的数据结构创建清晰且可强制执行的模式。这种方法有效地为提示输出提供了面向对象的外观,将原始文本或半结构化数据转换为经过验证的、类型提示的Python对象。

You can directly parse a JSON string from an LLM into a Pydantic object using the model_validate_json method. This is particularly useful as it combines parsing and validation in a single step.

您可以使用model_validate_json方法直接将LLM的JSON字符串解析为Pydantic对象。这特别有用,因为它将解析和验证结合在一个步骤中。

from pydantic import BaseModel, EmailStr, Field, ValidationError
from typing import ListOptional
from datetime import date

# --- Pydantic Model Definition (from above) ---
class User(BaseModel):
    name: str = Field(..., description="The full name of the user.")
    email: EmailStr = Field(..., description="The user's email address.")
    date_of_birth: Optional[date] = Field(None, description="The user's date of birth.")
    interests: List[str] = Field(default_factory=list, description="A list of the user's interests.")

# --- Hypothetical LLM Output ---
llm_output_json = """
{
    "name": "Alice Wonderland",
    "email": "alice.w@example.com",
    "date_of_birth": "1995-07-21",
    "interests": [
        "Natural Language Processing",
        "Python Programming",
        "Gardening"
    ]
}
"""

# --- Parsing and Validation ---
try:
    # Use the model_validate_json class method to parse the JSON string.
    # This single step parses the JSON and validates the data against the User model.
    user_object = User.model_validate_json(llm_output_json)
    
    # Now you can work with a clean, type-safe Python object.
    print("Successfully created User object!")
    print(f"Name: {user_object.name}")
    print(f"Email: {user_object.email}")
    print(f"Date of Birth: {user_object.date_of_birth}")
    print(f"First Interest: {user_object.interests[0]}")
    
    # You can access the data like any other Python object attribute.
    # Pydantic has already converted the 'date_of_birth' string to a datetime.date object.
    print(f"Type of date_of_birth: {type(user_object.date_of_birth)}")
    
except ValidationError as e:
    # If the JSON is malformed or the data doesn't match the model's types,
    # Pydantic will raise a ValidationError.
    print("Failed to validate JSON from LLM.")
    print(e)

This Python code demonstrates how to use the Pydantic library to define a data model and validate JSON data. It defines a User model with fields for name, email, date of birth, and interests, including type hints and descriptions. The code then parses a hypothetical JSON output from a Large Language Model (LLM) using the model_validate_json method. This method handles both JSON parsing and data validation according to the model's structure and types. Finally, the code accesses the validated data from the resulting Python object and includes error handling for ValidationError in case the JSON is invalid.

此Python代码演示了如何使用Pydantic库定义数据模型并验证JSON数据。它定义了一个User模型,包含姓名、电子邮件、出生日期和兴趣等字段,包括类型提示和描述。然后,代码使用model_validate_json方法解析来自大型语言模型(LLM)的假设JSON输出。此方法根据模型的结构和类型处理JSON解析和数据验证。最后,代码从生成的Python对象访问验证后的数据,并在JSON无效时包含ValidationError的错误处理。

For XML data, the xmltodict library can be used to convert the XML into a dictionary, which can then be passed to a Pydantic model for parsing. By using Field aliases in your Pydantic model, you can seamlessly map the often verbose or attribute-heavy structure of XML to your object's fields.

对于XML数据,可以使用xmltodict库将XML转换为字典,然后将其传递给Pydantic模型进行解析。通过在Pydantic模型中使用Field别名,可以无缝地将通常冗长或属性密集的XML结构映射到对象的字段。

This methodology is invaluable for ensuring the interoperability of LLM-based components with other parts of a larger system. When an LLM's output is encapsulated within a Pydantic object, it can be reliably passed to other functions, APIs, or data processing pipelines with the assurance that the data conforms to the expected structure and types. This practice of "parse, don't validate" at the boundaries of your system components leads to more robust and maintainable applications.

这种方法对于确保基于LLM的组件与更大系统的其他部分的互操作性非常宝贵。当LLM的输出封装在Pydantic对象中时,可以可靠地传递给其他函数、API或数据处理管道,并确保数据符合预期的结构和类型。这种在系统组件边界处"解析而非验证"的做法导致更健壮和可维护的应用程序。

Effectively utilizing system prompts, role assignments, contextual information, delimiters, and structured output significantly enhances the clarity, control, and utility of interactions with language models, providing a strong foundation for developing reliable agentic systems. Requesting structured output is crucial for creating pipelines where the language model's output serves as the input for subsequent system or processing steps.

有效利用系统提示、角色分配、上下文信息、分隔符和结构化输出显著增强了与语言模型交互的清晰度、控制性和实用性,为开发可靠的智能体系统提供了坚实基础。请求结构化输出对于创建语言模型输出作为后续系统或处理步骤输入的管道至关重要。

Structuring Prompts Beyond the basic techniques of providing examples, the way you structure your prompt plays a critical role in guiding the language model. Structuring involves using different sections or elements within the prompt to provide distinct types of information, such as instructions, context, or examples, in a clear and organized manner. This helps the model parse the prompt correctly and understand the specific role of each piece of text.

Reasoning and Thought Process Techniques

Large language models excel at pattern recognition and text generation but often face challenges with tasks requiring complex, multi-step reasoning. This appendix focuses on techniques designed to enhance these reasoning capabilities by encouraging models to reveal their internal thought processes. Specifically, it addresses methods to improve logical deduction, mathematical computation, and planning.

大型语言模型擅长模式识别和文本生成,但在需要复杂多步推理的任务上常常面临挑战。本附录重点介绍旨在通过鼓励模型揭示其内部思维过程来增强这些推理能力的技术。具体来说,它涉及改进逻辑推理、数学计算和规划的方法。

Chain of Thought (CoT)

The Chain of Thought (CoT) prompting technique is a powerful method for improving the reasoning abilities of language models by explicitly prompting the model to generate intermediate reasoning steps before arriving at a final answer. Instead of just asking for the result, you instruct the model to "think step by step." This process mirrors how a human might break down a problem into smaller, more manageable parts and work through them sequentially.

思维链(CoT)提示技术是一种强大的方法,通过明确提示模型在得出最终答案之前生成中间推理步骤来提高语言模型的推理能力。不仅仅是询问结果,而是指示模型"逐步思考"。这个过程反映了人类如何将问题分解为更小、更易管理的部分并按顺序处理它们。

CoT helps the LLM generate more accurate answers, particularly for tasks that require some form of calculation or logical deduction, where models might otherwise struggle and produce incorrect results. By generating these intermediate steps, the model is more likely to stay on track and perform the necessary operations correctly.

CoT帮助LLM生成更准确的答案,特别是对于需要某种形式计算或逻辑推理的任务,在这些任务中模型可能难以处理并产生错误结果。通过生成这些中间步骤,模型更有可能保持在正确的轨道上并正确执行必要的操作。

There are two main variations of CoT: CoT有两种主要变体:

  • Zero-Shot CoT: This involves simply adding the phrase "Let's think step by step" (or similar phrasing) to your prompt without providing any examples of the reasoning process. Surprisingly, for many tasks, this simple addition can significantly improve the model's performance by triggering its ability to expose its internal reasoning trace.

    • Example (Zero-Shot CoT):
      If a train travels at 60 miles per hour and covers a distance of 240 miles, how long did the journey take? Let's think step by step.
  • 零样本CoT:这涉及简单地在提示中添加短语"让我们逐步思考"(或类似措辞),而不提供任何推理过程的示例。令人惊讶的是,对于许多任务,这种简单的添加可以通过触发模型暴露其内部推理轨迹的能力来显著提高模型的性能。

    • 示例(零样本CoT) : 如果火车以每小时60英里的速度行驶,覆盖240英里的距离,旅程需要多长时间?让我们逐步思考。
  • Few-Shot CoT: This combines CoT with few-shot prompting. You provide the model with several examples where both the input, the step-by-step reasoning process, and the final output are shown. This gives the model a clearer template for how to perform the reasoning and structure its response, often leading to even better results on more complex tasks compared to zero-shot CoT.

    • Example (Few-Shot CoT):
      Q: The sum of three consecutive integers is 36. What are the integers?
      A: Let the first integer be x. The next consecutive integer is x+1, and the third is x+2. The sum is x + (x+1) + (x+2) = 3x + 3. We know the sum is 36, so 3x + 3 = 36. Subtract 3 from both sides: 3x = 33. Divide by 3: x = 11. The integers are 11, 11+1=12, and 11+2=13. The integers are 11, 12, and 13.

      Q: Sarah has 5 apples, and she buys 8 more. She eats 3 apples. How many apples does she have left? Let's think step by step.
      A: Let's think step by step. Sarah starts with 5 apples. She buys 8 more, so she adds 8 to her initial amount: 5 + 8 = 13 apples. Then, she eats 3 apples, so we subtract 3 from the total: 13 - 3 = 10. Sarah has 10 apples left. The answer is 10.

CoT offers several advantages. It is relatively low-effort to implement and can be highly effective with off-the-shelf LLMs without requiring fine-tuning. A significant benefit is the increased interpretability of the model's output; you can see the reasoning steps it followed, which helps in understanding why it arrived at a particular answer and in debugging if something went wrong. Additionally, CoT appears to improve the robustness of prompts across different versions of language models, meaning the performance is less likely to degrade when a model is updated. The main disadvantage is that generating the reasoning steps increases the length of the output, leading to higher token usage, which can increase costs and response time.

CoT提供了几个优势。它实施起来相对容易,并且可以在现成的LLM上非常有效,无需微调。一个显著的好处是模型输出的可解释性增加;您可以看到它遵循的推理步骤,这有助于理解为什么它得出特定答案,并在出现问题时进行调试。此外,CoT似乎提高了提示在不同版本语言模型之间的稳健性,意味着当模型更新时性能不太可能下降。主要缺点是生成推理步骤增加了输出长度,导致更高的标记使用量,这可能增加成本和响应时间。

Best practices for CoT include ensuring the final answer is presented after the reasoning steps, as the generation of the reasoning influences the subsequent token predictions for the answer. Also, for tasks with a single correct answer (like mathematical problems), setting the model's temperature to 0 (greedy decoding) is recommended when using CoT to ensure deterministic selection of the most probable next token at each step.

CoT的最佳实践包括确保最终答案在推理步骤之后呈现,因为推理的生成影响后续答案的标记预测。此外,对于具有单一正确答案的任务(如数学问题),在使用CoT时建议将模型的温度设置为0(贪婪解码),以确保在每个步骤中确定性选择最可能的下一个标记。

Self-Consistency

Building on the idea of Chain of Thought, the Self-Consistency technique aims to improve the reliability of reasoning by leveraging the probabilistic nature of language models. Instead of relying on a single greedy reasoning path (as in basic CoT), Self-Consistency generates multiple diverse reasoning paths for the same problem and then selects the most consistent answer among them.

基于思维链的思想,自一致性技术旨在通过利用语言模型的概率性质来提高推理的可靠性。与依赖单一贪婪推理路径(如基本CoT)不同,自一致性为同一问题生成多个不同的推理路径,然后选择其中最一致的答案。

Self-Consistency involves three main steps: 自一致性涉及三个主要步骤:

  1. Generating Diverse Reasoning Paths: The same prompt (often a CoT prompt) is sent to the LLM multiple times. By using a higher temperature setting, the model is encouraged to explore different reasoning approaches and generate varied step-by-step explanations.
  2. Extract the Answer: The final answer is extracted from each of the generated reasoning paths.
  3. Choose the Most Common Answer: A majority vote is performed on the extracted answers. The answer that appears most frequently across the diverse reasoning paths is selected as the final, most consistent answer.

This approach improves the accuracy and coherence of responses, particularly for tasks where multiple valid reasoning paths might exist or where the model might be prone to errors in a single attempt. The benefit is a pseudo-probability likelihood of the answer being correct, increasing overall accuracy. However, the significant cost is the need to run the model multiple times for the same query, leading to much higher computation and expense.

这种方法提高了响应的准确性和连贯性,特别是对于可能存在多个有效推理路径或模型在单次尝试中容易出错的任务。好处是答案正确的伪概率可能性,提高了整体准确性。然而,显著的成本是需要为同一查询多次运行模型,导致更高的计算和费用。

  • Example (Conceptual):

    • Prompt: "Is the statement 'All birds can fly' true or false? Explain your reasoning."
    • Model Run 1 (High Temp): Reasons about most birds flying, concludes True.
    • Model Run 2 (High Temp): Reasons about penguins and ostriches, concludes False.
    • Model Run 3 (High Temp): Reasons about birds in general, mentions exceptions briefly, concludes True.
    • Self-Consistency Result: Based on majority vote (True appears twice), the final answer is "True". (Note: A more sophisticated approach would weigh the reasoning quality).
  • 示例(概念性)

    • 提示:"所有鸟都能飞"这个陈述是真还是假?解释你的推理。
    • 模型运行1(高温) :推理大多数鸟会飞,结论为真。
    • 模型运行2(高温) :推理企鹅和鸵鸟,结论为假。
    • 模型运行3(高温) :推理鸟一般情况,brief提及例外,结论为真。
    • 自一致性结果:基于多数投票(真出现两次),最终答案为"真"。(注意:更复杂的方法会权衡推理质量)。

Step-Back Prompting

Step-back prompting enhances reasoning by first asking the language model to consider a general principle or concept related to the task before addressing specific details. The response to this broader question is then used as context for solving the original problem.

退一步提示通过首先要求语言模型考虑与任务相关的一般原则或概念来增强推理,然后再处理具体细节。对这个更广泛的问题响应然后用作解决原始问题的上下文。

This process allows the language model to activate relevant background knowledge and wider reasoning strategies. By focusing on underlying principles or higher-level abstractions, the model can generate more accurate and insightful answers, less influenced by superficial elements. Initially considering general factors can provide a stronger basis for generating specific creative outputs. Step-back prompting encourages critical thinking and the application of knowledge, potentially mitigating biases by emphasizing general principles.

这个过程允许语言模型激活相关的背景知识和更广泛的推理策略。通过关注基本原则或更高层次的抽象,模型可以生成更准确和富有洞察力的答案,较少受表面元素的影响。最初考虑一般因素可以为生成特定的创造性输出提供更强的基础。退一步提示鼓励批判性思维和知识应用,可能通过强调一般原则来减轻偏见。

  • Example:

    • Prompt 1 (Step-Back): "What are the key factors that make a good detective story?"
    • Model Response 1: (Lists elements like red herrings, compelling motive, flawed protagonist, logical clues, satisfying resolution).
    • Prompt 2 (Original Task + Step-Back Context): "Using the key factors of a good detective story [insert Model Response 1 here], write a short plot summary for a new mystery novel set in a small town."

Tree of Thoughts (ToT)

Tree of Thoughts (ToT) is an advanced reasoning technique that extends the Chain of Thought method. It enables a language model to explore multiple reasoning paths concurrently, instead of following a single linear progression. This technique utilizes a tree structure, where each node represents a "thought"—a coherent language sequence acting as an intermediate step. From each node, the model can branch out, exploring alternative reasoning routes.

思维树(ToT)是一种高级推理技术,扩展了思维链方法。它使语言模型能够同时探索多个推理路径,而不是遵循单一的线性进展。这种技术利用树结构,其中每个节点代表一个"思维"——一个连贯的语言序列,作为中间步骤。从每个节点,模型可以分支出去,探索替代的推理路线。

ToT is particularly suited for complex problems that require exploration, backtracking, or the evaluation of multiple possibilities before arriving at a solution. While more computationally demanding and intricate to implement than the linear Chain of Thought method, ToT can achieve superior results on tasks necessitating deliberate and exploratory problem-solving. It allows an agent to consider diverse perspectives and potentially recover from initial errors by investigating alternative branches within the "thought tree."

ToT特别适合需要探索、回溯或在得出解决方案之前评估多种可能性的复杂问题。虽然比线性思维链方法计算要求更高且实现更复杂,但ToT可以在需要深思熟虑和探索性解决问题的任务上取得优异结果。它允许智能体考虑不同的视角,并通过调查"思维树"内的替代分支可能从初始错误中恢复。

  • Example (Conceptual): For a complex creative writing task like "Develop three different possible endings for a story based on these plot points," ToT would allow the model to explore distinct narrative branches from a key turning point, rather than just generating one linear continuation.

These reasoning and thought process techniques are crucial for building agents capable of handling tasks that go beyond simple information retrieval or text generation. By prompting models to expose their reasoning, consider multiple perspectives, or step back to general principles, we can significantly enhance their ability to perform complex cognitive tasks within agentic systems.

这些推理和思维过程技术对于构建能够处理超越简单信息检索或文本生成任务的智能体很重要。通过提示模型暴露其推理、考虑多个视角或退回到一般原则,我们可以显著增强它们在智能体系统中执行复杂认知任务的能力。

Action and Interaction Techniques

Intelligent agents possess the capability to actively engage with their environment, beyond generating text. This includes utilizing tools, executing external functions, and participating in iterative cycles of observation, reasoning, and action. This section examines prompting techniques designed to enable these active behaviors.

智能智能体具有主动与环境互动的能力,超越生成文本。这包括利用工具、执行外部函数以及参与观察、推理和行动的迭代循环。本节探讨旨在实现这些主动行为的提示技术。

Tool Use / Function Calling

A crucial ability for an agent is using external tools or calling functions to perform actions beyond its internal capabilities. These actions may include web searches, database access, sending emails, performing calculations, or interacting with external APIs. Effective prompting for tool use involves designing prompts that instruct the model on the appropriate timing and methodology for tool utilization.

智能体的关键能力是使用外部工具或调用函数来执行超出其内部能力的行动。这些行动可能包括网络搜索、数据库访问、发送电子邮件、执行计算或与外部API交互。有效的工具使用提示涉及设计提示,指导模型在适当的时机和方法上使用工具。

Modern language models often undergo fine-tuning for "function calling" or "tool use." This enables them to interpret descriptions of available tools, including their purpose and parameters. Upon receiving a user request, the model can determine the necessity of tool use, identify the appropriate tool, and format the required arguments for its invocation. The model does not execute the tool directly. Instead, it generates a structured output, typically in JSON format, specifying the tool and its parameters. An agentic system then processes this output, executes the tool, and provides the tool's result back to the model, integrating it into the ongoing interaction.

现代语言模型通常经过"函数调用"或"工具使用"的微调。这使它们能够解释可用工具的描述,包括其目的和参数。在收到用户请求后,模型可以确定工具使用的必要性,识别适当的工具,并格式化调用所需的参数。模型不直接执行工具。相反,它生成结构化输出,通常为JSON格式,指定工具及其参数。然后智能体系统处理此输出,执行工具,并将工具结果提供回模型,将其集成到正在进行的交互中。

现代语言模型通常经过"函数调用"或"工具使用"的微调。这使它们能够解释可用工具的描述,包括其目的和参数。在收到用户请求后,模型可以确定工具使用的必要性,识别适当的工具,并格式化调用所需的参数。模型不直接执行工具。相反,它生成结构化输出,通常为JSON格式,指定工具及其参数。然后智能体系统处理此输出,执行工具,并将工具结果提供回模型,将其集成到正在进行的交互中。

  • Example:
    You have access to a weather tool that can get the current weather for a specified city. The tool is called 'get_current_weather' and takes a 'city' parameter (string).

    User: What's the weather like in London right now?

    • Expected Model Output (Function Call):
      {
      "tool_code": "get_current_weather",
      "tool_name": "get_current_weather",
      "parameters": {
      "city": "London"
      }
      }

ReAct (Reason & Act)

ReAct, short for Reason and Act, is a prompting paradigm that combines Chain of Thought-style reasoning with the ability to perform actions using tools in an interleaved manner. ReAct mimics how humans operate – we reason verbally and take actions to gather more information or make progress towards a goal.

ReAct,是Reason and Act的缩写,是一种提示范式,结合了思维链式推理与使用工具执行行动的能力,以交错方式进行。ReAct模仿人类操作方式——我们口头推理并采取行动来收集更多信息或朝着目标取得进展。

The ReAct pattern involves a loop:

  1. Thought: The model generates a thought process, explaining its current understanding and plan.
  2. Action: Based on the thought, the model decides to perform an action, often using a tool (e.g., Search, Calculator, API call). It outputs the name of the tool and the required input.
  3. Observation: The agentic system executes the tool and provides the result (Observation) back to the model.
  4. The loop continues with the model generating a new "Thought" based on the latest "Observation," leading to further "Actions" and "Observations" until the task is complete and the model outputs a "Final Answer."
  5. 思考:模型生成思考过程,解释其当前理解和计划。
  6. 行动:基于思考,模型决定执行行动,通常使用工具(例如,搜索、计算器、API调用)。它输出工具名称和所需输入。
  7. 观察:智能体系统执行工具并将结果(观察)提供回模型。
  8. 循环继续,模型基于最新的"观察"生成新的"思考",导致进一步的"行动"和"观察",直到任务完成,模型输出"最终答案"。

This interleaved process of thinking and acting allows the agent to dynamically gather information, react to tool outputs, and refine its approach, making it particularly effective for tasks requiring interaction with dynamic environments or external knowledge sources.

这种思考和行动的交错过程允许智能体动态收集信息,对工具输出做出反应,并改进其方法,使其特别适合需要与动态环境或外部知识源交互的任务。

  • Example (Illustrative ReAct Trace):
    User: What is the capital of France and its current population?

    Thought: The user is asking for two pieces of information: the capital of France and its population. I can use a search tool to find this information. I will start by confirming the capital, although I am likely to know it.
    Action: Search({"query": "capital of France"})
    Observation: Paris

    Thought: Okay, the capital is Paris. Now I need to find the population of Paris.
    Action: Search({"query": "current population of Paris"})
    Observation: The population of Paris was estimated to be around 2.1 million people in 2023.

    Thought: I have found both pieces of information. I can now provide the final answer to the user.
    Final Answer: The capital of France is Paris, and its estimated population in 2023 was around 2.1 million people.

These techniques are vital for building agents that can actively engage with the world, retrieve real-time information, and perform tasks that require interacting with external systems.

这些技术对于构建能够主动与世界互动、检索实时信息并执行需要与外部系统交互的任务的智能体很重要。

Advanced Techniques

Beyond the foundational, structural, and reasoning patterns, there are several other prompting techniques that can further enhance the capabilities and efficiency of agentic systems. These range from using AI to optimize prompts to incorporating external knowledge and tailoring responses based on user characteristics.

Automatic Prompt Engineering (APE)

Recognizing that crafting effective prompts can be a complex and iterative process, Automatic Prompt Engineering (APE) explores using language models themselves to generate, evaluate, and refine prompts. This method aims to automate the prompt writing process, potentially enhancing model performance without requiring extensive human effort in prompt design.

The general idea is to have a "meta-model" or a process that takes a task description and generates multiple candidate prompts. These prompts are then evaluated based on the quality of the output they produce on a given set of inputs (perhaps using metrics like BLEU or ROUGE, or human evaluation). The best-performing prompts can be selected, potentially refined further, and used for the target task. Using an LLM to generate variations of a user query for training a chatbot is an example of this.

  • Example (Conceptual): A developer provides a description: "I need a prompt that can extract the date and sender from an email." An APE system generates several candidate prompts. These are tested on sample emails, and the prompt that consistently extracts the correct information is selected.

Of course. Here is a rephrased and slightly expanded explanation of programmatic prompt optimization using frameworks like DSPy:

当然。以下是使用DSPy等框架进行程序化提示优化的重新表述和略微扩展的解释:

Another powerful prompt optimization technique, notably promoted by the DSPy framework, involves treating prompts not as static text but as programmatic modules that can be automatically optimized. This approach moves beyond manual trial-and-error and into a more systematic, data-driven methodology.

另一种强大的提示优化技术,特别是由DSPy框架推广的技术,涉及将提示视为可自动优化的程序化模块,而不是静态文本。这种方法超越了手动试错,进入了更系统化、数据驱动的方法论。

The core of this technique relies on two key components:

这种技术的核心依赖于两个关键组件:

  1. A Goldset (or High-Quality Dataset): This is a representative set of high-quality input-and-output pairs. It serves as the "ground truth" that defines what a successful response looks like for a given task.
    黄金集(或高质量数据集): 这是一组具有代表性的高质量输入-输出对。它作为"基本事实",定义了对于给定任务成功响应应该是什么样子。
  2. An Objective Function (or Scoring Metric): This is a function that automatically evaluates the LLM's output against the corresponding "golden" output from the dataset. It returns a score indicating the quality, accuracy, or correctness of the response. 目标函数(或评分指标): 这是一个函数,自动将LLM的输出与数据集中相应的"黄金"输出进行比较评估。它返回一个分数,指示响应的质量、准确性或正确性。

Using these components, an optimizer, such as a Bayesian optimizer, systematically refines the prompt. This process typically involves two main strategies, which can be used independently or in concert:

使用这些组件,优化器(如贝叶斯优化器)系统地优化提示。这个过程通常涉及两个主要策略,可以独立使用或协同使用:

  • Few-Shot Example Optimization: Instead of a developer manually selecting examples for a few-shot prompt, the optimizer programmatically samples different combinations of examples from the goldset. It then tests these combinations to identify the specific set of examples that most effectively guides the model toward generating the desired outputs. 少样本示例优化: 不是由开发人员手动为少样本提示选择示例,而是优化器从黄金集中程序化地采样不同的示例组合。然后测试这些组合,以识别最能有效引导模型生成期望输出的特定示例集。
  • Instructional Prompt Optimization: In this approach, the optimizer automatically refines the prompt's core instructions. It uses an LLM as a "meta-model" to iteratively mutate and rephrase the prompt's text—adjusting the wording, tone, or structure—to discover which phrasing yields the highest scores from the objective function. 指令提示优化: 在这种方法中,优化器自动优化提示的核心指令。它使用LLM作为"元模型"来迭代地变异和重新表述提示文本——调整措辞、语气或结构——以发现哪种表述从目标函数中获得最高分数。

The ultimate goal for both strategies is to maximize the scores from the objective function, effectively "training" the prompt to produce results that are consistently closer to the high-quality goldset. By combining these two approaches, the system can simultaneously optimize what instructions to give the model and which examples to show it, leading to a highly effective and robust prompt that is machine-optimized for the specific task.

两种策略的最终目标是最大化目标函数的分数,有效地"训练"提示以产生始终更接近高质量黄金集的结果。通过结合这两种方法,系统可以同时优化给模型的指令展示给模型的示例,从而产生一个高度有效且稳健的提示,该提示针对特定任务进行了机器优化。

Iterative Prompting / Refinement

This technique involves starting with a simple, basic prompt and then iteratively refining it based on the model's initial responses. If the model's output isn't quite right, you analyze the shortcomings and modify the prompt to address them. This is less about an automated process (like APE) and more about a human-driven iterative design loop.

  • Example:

    • Attempt 1: "Write a product description for a new type of coffee maker." (Result is too generic).
    • Attempt 2: "Write a product description for a new type of coffee maker. Highlight its speed and ease of cleaning." (Result is better, but lacks detail).
    • Attempt 3: "Write a product description for the 'SpeedClean Coffee Pro'. Emphasize its ability to brew a pot in under 2 minutes and its self-cleaning cycle. Target busy professionals." (Result is much closer to desired).

迭代提示/优化

这种技术涉及从一个简单的基础提示开始,然后根据模型的初始响应进行迭代优化。如果模型的输出不太正确,您可以分析不足之处并修改提示来解决这些问题。这更像是一个人工驱动的迭代设计循环,而不是自动化过程(如APE)。

  • 示例:

    • 尝试1: "为一种新型咖啡机写产品描述。"(结果太泛化)。
    • 尝试2: "为一种新型咖啡机写产品描述。突出其速度和易清洁性。"(结果更好,但缺乏细节)。
    • 尝试3: "为'SpeedClean Coffee Pro'写产品描述。强调其在2分钟内冲泡一壶咖啡的能力及其自清洁循环。目标受众为忙碌的专业人士。"(结果更接近期望)。

Providing Negative Examples

虽然"指令优于约束"的原则通常成立,但在某些情况下,提供负面示例可能是有帮助的,尽管需要谨慎使用。负面示例向模型展示一个输入和一个不期望的输出,或者一个输入和一个不应该生成的输出。这有助于澄清边界或防止特定类型的不正确响应。

  • Example:
    Generate a list of popular tourist attractions in Paris. Do NOT include the Eiffel Tower.

    Example of what NOT to do:
    Input: List popular landmarks in Paris.
    Output: The Eiffel Tower, The Louvre, Notre Dame Cathedral.

提供负面示例

虽然"指令优于约束"的原则通常成立,但在某些情况下,提供负面示例可能是有帮助的,尽管需要谨慎使用。负面示例向模型展示一个输入和一个不期望的输出,或者一个输入和一个不应该生成的输出。这有助于澄清边界或防止特定类型的不正确响应。

  • 示例:
    生成巴黎热门旅游景点列表。不要包括埃菲尔铁塔。

    不应该做的示例:
    输入:列出巴黎的著名地标。
    输出:埃菲尔铁塔、卢浮宫、巴黎圣母院。

Using Analogies

Framing a task using an analogy can sometimes help the model understand the desired output or process by relating it to something familiar. This can be particularly useful for creative tasks or explaining complex roles.

  • Example:
    Act as a "data chef". Take the raw ingredients (data points) and prepare a "summary dish" (report) that highlights the key flavors (trends) for a business audience.

使用类比

使用类比来构建任务有时可以帮助模型通过将其与熟悉的事物联系起来来理解期望的输出或过程。这对于创造性任务或解释复杂角色特别有用。

  • 示例:
    扮演一个"数据厨师"。取原材料(数据点)并准备一个"总结菜肴"(报告),突出关键风味(趋势)给商业受众。

Factored Cognition / Decomposition

For very complex tasks, it can be effective to break down the overall goal into smaller, more manageable sub-tasks and prompt the model separately on each sub-task. The results from the sub-tasks are then combined to achieve the final outcome. This is related to prompt chaining and planning but emphasizes the deliberate decomposition of the problem.

  • Example: To write a research paper:

    • Prompt 1: "Generate a detailed outline for a paper on the impact of AI on the job market."
    • Prompt 2: "Write the introduction section based on this outline: [insert outline intro]."
    • Prompt 3: "Write the section on 'Impact on White-Collar Jobs' based on this outline: [insert outline section]." (Repeat for other sections).
    • Prompt N: "Combine these sections and write a conclusion."

分解认知/任务分解

对于非常复杂的任务,将整体目标分解为更小、更易管理的子任务,并分别对每个子任务进行提示可能是有效的。然后将子任务的结果组合起来以实现最终结果。这与提示链和规划相关,但强调对问题的有意分解。

  • 示例: 写一篇研究论文:

    • 提示1:"生成一篇关于AI对就业市场影响的论文的详细大纲。"
    • 提示2:"基于此大纲写引言部分:[插入大纲引言]。"
    • 提示3:"基于此大纲写'对白领工作的影响'部分:[插入大纲部分]。"(对其他部分重复)。
    • 提示N:"组合这些部分并写结论。"

Retrieval Augmented Generation (RAG)

RAG is a powerful technique that enhances language models by giving them access to external, up-to-date, or domain-specific information during the prompting process. When a user asks a question, the system first retrieves relevant documents or data from a knowledge base (e.g., a database, a set of documents, the web). This retrieved information is then included in the prompt as context, allowing the language model to generate a response grounded in that external knowledge. This mitigates issues like hallucination and provides access to information the model wasn't trained on or that is very recent. This is a key pattern for agentic systems that need to work with dynamic or proprietary information.

  • Example:

    • User Query: "What are the new features in the latest version of the Python library 'X'?"
    • System Action: Search a documentation database for "Python library X latest features".
    • Prompt to LLM: "Based on the following documentation snippets: [insert retrieved text], explain the new features in the latest version of Python library 'X'."

检索增强生成(RAG)

RAG是一种强大的技术,通过在提示过程中为语言模型提供对外部、最新或领域特定信息的访问来增强它们。当用户提出问题时,系统首先从知识库(例如,数据库、一组文档、网络)中检索相关文档或数据。然后将检索到的信息作为上下文包含在提示中,允许语言模型基于该外部知识生成响应。这减轻了幻觉等问题,并提供了对模型未训练过或非常新的信息的访问。这是需要处理动态或专有信息的智能体系统的关键模式。

  • 示例:

    • 用户查询: "Python库'X'最新版本有哪些新功能?"
    • 系统操作: 在文档数据库中搜索"Python库X最新功能"。
    • 给LLM的提示: "基于以下文档片段:[插入检索到的文本],解释Python库'X'最新版本的新功能。"

Persona Pattern (User Persona):

While role prompting assigns a persona to the model, the Persona Pattern involves describing the user or the target audience for the model's output. This helps the model tailor its response in terms of language, complexity, tone, and the kind of information it provides.

  • Example:
    You are explaining quantum physics. The target audience is a high school student with no prior knowledge of the subject. Explain it simply and use analogies they might understand.

    Explain quantum physics: [Insert basic explanation request]

These advanced and supplementary techniques provide further tools for prompt engineers to optimize model behavior, integrate external information, and tailor interactions for specific users and tasks within agentic workflows.

Using Google Gems

Google's AI "Gems" (see Fig. 1) represent a user-configurable feature within its large language model architecture. Each "Gem" functions as a specialized instance of the core Gemini AI, tailored for specific, repeatable tasks. Users create a Gem by providing it with a set of explicit instructions, which establishes its operational parameters. This initial instruction set defines the Gem's designated purpose, response style, and knowledge domain. The underlying model is designed to consistently adhere to these pre-defined directives throughout a conversation.

This allows for the creation of highly specialized AI agents for focused applications. For example, a Gem can be configured to function as a code interpreter that only references specific programming libraries. Another could be instructed to analyze data sets, generating summaries without speculative commentary. A different Gem might serve as a translator adhering to a particular formal style guide. This process creates a persistent, task-specific context for the artificial intelligence.

Consequently, the user avoids the need to re-establish the same contextual information with each new query. This methodology reduces conversational redundancy and improves the efficiency of task execution. The resulting interactions are more focused, yielding outputs that are consistently aligned with the user's initial requirements. This framework allows for applying fine-grained, persistent user direction to a generalist AI model. Ultimately, Gems enable a shift from general-purpose interaction to specialized, pre-defined AI functionalities.

![][image1]
Fig.1: Example of Google Gem usage.

Using Google Gems

Google的AI"Gems"(见图1)代表了其大型语言模型架构中的一个用户可配置功能。每个"Gem"作为核心Gemini AI的专门实例,为特定、可重复的任务量身定制。用户通过提供一组明确的指令来创建Gem,这建立了其操作参数。这个初始指令集定义了Gem的指定目的、响应风格和知识领域。底层模型被设计为在整个对话中始终遵守这些预定义的指令。

这允许创建高度专业化的AI智能体用于专注的应用。例如,可以配置一个Gem作为代码解释器,仅引用特定的编程库。另一个可以指示分析数据集,生成总结而不包含推测性评论。不同的Gem可能作为翻译器,遵守特定的正式风格指南。这个过程为人工智能创建了一个持久的、特定于任务的上下文。

因此,用户避免了每次新查询都需要重新建立相同上下文信息的需要。这种方法减少了对话冗余,提高了任务执行的效率。由此产生的交互更加专注,产生的输出始终与用户的初始要求一致。这个框架允许将细粒度的、持久的用户指导应用于通用AI模型。最终,Gems实现了从通用交互到专门的、预定义的AI功能的转变。

![][image1]
图1:Google Gem使用示例。

  • Example:
    Analyze the following prompt for a language model and suggest ways to improve it to consistently extract the main topic and key entities (people, organizations, locations) from news articles. The current prompt sometimes misses entities or gets the main topic wrong.

    Existing Prompt:
    "Summarize the main points and list important names and places from this article: [insert article text]"

    Suggestions for Improvement:

In this example, we're using the LLM to critique and enhance another prompt. This meta-level interaction demonstrates the flexibility and power of these models, allowing us to build more effective agentic systems by first optimizing the fundamental instructions they receive. It's a fascinating loop where AI helps us talk better to AI.

Prompting for Specific Tasks

While the techniques discussed so far are broadly applicable, some tasks benefit from specific prompting considerations. These are particularly relevant in the realm of code and multimodal inputs.

Code Prompting

Language models, especially those trained on large code datasets, can be powerful assistants for developers. Prompting for code involves using LLMs to generate, explain, translate, or debug code. Various use cases exist:

  • Prompts for writing code: Asking the model to generate code snippets or functions based on a description of the desired functionality.

    • Example: "Write a Python function that takes a list of numbers and returns the average."
  • Prompts for explaining code: Providing a code snippet and asking the model to explain what it does, line by line or in a summary.

    • Example: "Explain the following JavaScript code snippet: [insert code]."
  • Prompts for translating code: Asking the model to translate code from one programming language to another.

    • Example: "Translate the following Java code to C++: [insert code]."
  • Prompts for debugging and reviewing code: Providing code that has an error or could be improved and asking the model to identify issues, suggest fixes, or provide refactoring suggestions.

    • Example: "The following Python code is giving a 'NameError'. What is wrong and how can I fix it? [insert code and traceback]."

Effective code prompting often requires providing sufficient context, specifying the desired language and version, and being clear about the functionality or issue.

Multimodal Prompting

While the focus of this appendix and much of current LLM interaction is text-based, the field is rapidly moving towards multimodal models that can process and generate information across different modalities (text, images, audio, video, etc.). Multimodal prompting involves using a combination of inputs to guide the model. This refers to using multiple input formats instead of just text.

  • Example: Providing an image of a diagram and asking the model to explain the process shown in the diagram (Image Input + Text Prompt). Or providing an image and asking the model to generate a descriptive caption (Image Input + Text Prompt -> Text Output).

As multimodal capabilities become more sophisticated, prompting techniques will evolve to effectively leverage these combined inputs and outputs.

特定任务提示

虽然到目前为止讨论的技术广泛适用,但某些任务受益于特定的提示考虑。这些在代码和多模态输入领域尤其相关。

代码提示

语言模型,特别是那些在大型代码数据集上训练的模型,可以成为开发人员的强大助手。代码提示涉及使用LLM生成、解释、翻译或调试代码。存在各种用例:

  • 编写代码的提示: 要求模型根据所需功能的描述生成代码片段或函数。

    • 示例: "编写一个Python函数,接受数字列表并返回平均值。"
  • 解释代码的提示: 提供代码片段并要求模型解释其功能,逐行或总结。

    • 示例: "解释以下JavaScript代码片段:[插入代码]。"
  • 翻译代码的提示: 要求模型将代码从一种编程语言翻译到另一种。

    • 示例: "将以下Java代码翻译为C++:[插入代码]。"
  • 调试和审查代码的提示: 提供有错误或可以改进的代码,并要求模型识别问题、建议修复或提供重构建议。

    • 示例: "以下Python代码出现'NameError'。问题是什么以及如何修复?[插入代码和回溯]。"

有效的代码提示通常需要提供足够的上下文,指定所需的语言和版本,并明确功能或问题。

多模态提示

虽然本附录和当前大部分LLM交互的重点是基于文本的,但该领域正在迅速向能够跨不同模态(文本、图像、音频、视频等)处理和生成信息的多模态模型发展。多模态提示涉及使用输入组合来引导模型。这指的是使用多种输入格式而不仅仅是文本。

  • 示例: 提供图表图像并要求模型解释图中显示的过程(图像输入 + 文本提示)。或者提供图像并要求模型生成描述性标题(图像输入 + 文本提示 -> 文本输出)。

随着多模态能力变得更加复杂,提示技术将发展以有效利用这些组合输入和输出。

Best Practices and Experimentation

Becoming a skilled prompt engineer is an iterative process that involves continuous learning and experimentation. Several valuable best practices are worth reiterating and emphasizing:

  • Provide Examples: Providing one or few-shot examples is one of the most effective ways to guide the model.
  • Design with Simplicity: Keep your prompts concise, clear, and easy to understand. Avoid unnecessary jargon or overly complex phrasing.
  • Be Specific about the Output: Clearly define the desired format, length, style, and content of the model's response.
  • Use Instructions over Constraints: Focus on telling the model what you want it to do rather than what you don't want it to do.
  • Control the Max Token Length: Use model configurations or explicit prompt instructions to manage the length of the generated output.
  • Use Variables in Prompts: For prompts used in applications, use variables to make them dynamic and reusable, avoiding hardcoding specific values.
  • Experiment with Input Formats and Writing Styles: Try different ways of phrasing your prompt (question, statement, instruction) and experiment with different tones or styles to see what yields the best results.
  • For Few-Shot Prompting with Classification Tasks, Mix Up the Classes: Randomize the order of examples from different categories to prevent overfitting.
  • Adapt to Model Updates: Language models are constantly being updated. Be prepared to test your existing prompts on new model versions and adjust them to leverage new capabilities or maintain performance.
  • Experiment with Output Formats: Especially for non-creative tasks, experiment with requesting structured output like JSON or XML.
  • Experiment Together with Other Prompt Engineers: Collaborating with others can provide different perspectives and lead to discovering more effective prompts.
  • CoT Best Practices: Remember specific practices for Chain of Thought, such as placing the answer after the reasoning and setting temperature to 0 for tasks with a single correct answer.
  • Document the Various Prompt Attempts: This is crucial for tracking what works, what doesn't, and why. Maintain a structured record of your prompts, configurations, and results.
  • Save Prompts in Codebases: When integrating prompts into applications, store them in separate, well-organized files for easier maintenance and version control.
  • Rely on Automated Tests and Evaluation: For production systems, implement automated tests and evaluation procedures to monitor prompt performance and ensure generalization to new data.

Prompt engineering is a skill that improves with practice. By applying these principles and techniques, and by maintaining a systematic approach to experimentation and documentation, you can significantly enhance your ability to build effective agentic systems.

最佳实践与实验

成为一名熟练的提示工程师是一个迭代过程,涉及持续学习和实验。有几个有价值的最佳实践值得重申和强调:

  • 提供示例: 提供一个或少量示例是引导模型最有效的方法之一。
  • 设计简洁: 保持提示简洁、清晰且易于理解。避免不必要的行话或过于复杂的措辞。
  • 明确输出要求: 明确定义模型响应的期望格式、长度、风格和内容。
  • 使用指令而非约束: 专注于告诉模型你想要它做什么,而不是你不想让它做什么。
  • 控制最大标记长度: 使用模型配置或明确的提示指令来管理生成输出的长度。
  • 在提示中使用变量: 对于应用程序中使用的提示,使用变量使其动态和可重用,避免硬编码特定值。
  • 尝试输入格式和写作风格: 尝试不同的提示表达方式(问题、陈述、指令),并尝试不同的语气或风格,看看哪种能产生最佳结果。
  • 对于分类任务的少样本提示,混合类别: 随机化不同类别的示例顺序以防止过拟合。
  • 适应模型更新: 语言模型不断更新。准备好在新模型版本上测试现有提示,并调整它们以利用新功能或保持性能。
  • 尝试输出格式: 特别是对于非创造性任务,尝试请求结构化输出,如JSON或XML。
  • 与其他提示工程师一起实验: 与他人合作可以提供不同的视角,并导致发现更有效的提示。
  • CoT最佳实践: 记住思维链的具体实践,例如将答案放在推理之后,并为有单一正确答案的任务设置温度为0。
  • 记录各种提示尝试: 这对于跟踪什么有效、什么无效以及原因至关重要。维护提示、配置和结果的结构化记录。
  • 将提示保存在代码库中: 将提示集成到应用程序中时,将它们存储在单独、组织良好的文件中,以便于维护和版本控制。
  • 依赖自动化测试和评估: 对于生产系统,实施自动化测试和评估程序以监控提示性能并确保对新数据的泛化。

提示工程是一项通过实践提高的技能。通过应用这些原则和技术,并保持实验和文档的系统方法,你可以显著增强构建有效智能体系统的能力。

Conclusion

This appendix provides a comprehensive overview of prompting, reframing it as a disciplined engineering practice rather than a simple act of asking questions. Its central purpose is to demonstrate how to transform general-purpose language models into specialized, reliable, and highly capable tools for specific tasks. The journey begins with non-negotiable core principles like clarity, conciseness, and iterative experimentation, which are the bedrock of effective communication with AI. These principles are critical because they reduce the inherent ambiguity in natural language, helping to steer the model's probabilistic outputs toward a single, correct intention. Building on this foundation, basic techniques such as zero-shot, one-shot, and few-shot prompting serve as the primary methods for demonstrating expected behavior through examples. These methods provide varying levels of contextual guidance, powerfully shaping the model's response style, tone, and format. Beyond just examples, structuring prompts with explicit roles, system-level instructions, and clear delimiters provides an essential architectural layer for fine-grained control over the model.

结论

本附录全面概述了提示工程,将其重新定义为一种有纪律的工程实践,而不仅仅是简单的提问行为。其核心目的是展示如何将通用语言模型转变为专门、可靠且高度能力的特定任务工具。这一旅程始于不可协商的核心原则,如清晰性、简洁性和迭代实验,这些是与AI有效沟通的基石。这些原则至关重要,因为它们减少了自然语言中固有的模糊性,有助于将模型的概率输出引导到单一、正确的意图。在此基础上,基本技术如零样本、单样本和少样本提示作为通过示例展示预期行为的主要方法。这些方法提供不同层次的上下文指导,有力地塑造模型的响应风格、语气和格式。除了示例之外,使用明确角色、系统级指令和清晰分隔符构建提示为对模型的细粒度控制提供了必要的架构层。

这些技术的重要性在构建自主智能体的背景下变得至关重要,它们为复杂、多步骤操作提供了必要的控制和可靠性。为了使智能体能够有效地创建和执行计划,它必须利用高级推理模式,如思维链和思维树。这些复杂的方法迫使模型外化其逻辑步骤,系统地将复杂目标分解为一系列可管理的子任务。整个智能体系统的操作可靠性取决于每个组件输出的可预测性。这正是为什么请求结构化数据(如JSON)并使用工具(如Pydantic)进行程序化验证不仅仅是便利,而是稳健自动化的绝对必要性。没有这种纪律,智能体的内部认知组件无法可靠地通信,导致自动化工作流中的灾难性故障。最终,这些结构化和推理技术成功地将模型的概率文本生成转换为智能体的确定性和可信赖的认知引擎。

此外,这些提示赋予了智能体感知环境并对其采取行动的关键能力,弥合了数字思维与现实世界交互之间的差距。以行动为导向的框架,如ReAct和原生函数调用,是作为智能体手的重要机制,使其能够使用工具、查询API和操作数据。同时,检索增强生成(RAG)等技术以及更广泛的上下文工程学科作为智能体的感官。它们主动从外部知识库检索相关的实时信息,确保智能体的决策基于当前、事实的现实。这一关键能力防止智能体在真空中操作,否则它将受限于其静态且可能过时的训练数据。因此,掌握这一完整的提示谱系是将通用语言模型从简单的文本生成器提升为真正复杂的智能体的决定性技能,使其能够以自主性、意识和智能执行复杂任务。

References

Here is a list of resources for further reading and deeper exploration of prompt engineering techniques:

  1. Prompt Engineering, www.kaggle.com/whitepaper-…
  2. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, arxiv.org/abs/2201.11…
  3. Self-Consistency Improves Chain of Thought Reasoning in Language Models, arxiv.org/pdf/2203.11…
  4. ReAct: Synergizing Reasoning and Acting in Language Models, arxiv.org/abs/2210.03…
  5. Tree of Thoughts: Deliberate Problem Solving with Large Language Models, arxiv.org/pdf/2305.10…
  6. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models, arxiv.org/abs/2310.06…
  7. DSPy: Programming—not prompting—Foundation Models github.com/stanfordnlp…

参考文献

以下是进一步阅读和深入探索提示工程技术的资源列表:

  1. 提示工程,www.kaggle.com/whitepaper-…
  2. 思维链提示在大型语言模型中引发推理,arxiv.org/abs/2201.11…
  3. 自一致性改进语言模型中的思维链推理,arxiv.org/pdf/2203.11…
  4. ReAct:在语言模型中协同推理与行动,arxiv.org/abs/2210.03…
  5. 思维树:使用大型语言模型进行深思熟虑的问题解决,arxiv.org/pdf/2305.10…
  6. 退一步:通过抽象在大型语言模型中引发推理,arxiv.org/abs/2310.06…
  7. DSPy:编程而非提示基础模型,github.com/stanfordnlp…