充分利用LLM的潜力:提示性工程指南

376 阅读11分钟

近年来,语言模型迅速提高,大型语言模型(LLMs),如GPT-3和GPT-4占据了中心位置。这些模型之所以受欢迎,是因为它们能够以令人难以置信的技巧完成大量的任务。同时,随着这些模型的参数数量(以十亿计!)的增加,这些模型也不可预测地获得了新的能力。

在这篇文章中,我们将探讨LLMs,它们可以执行的任务,它们的缺点,以及各种提示工程策略。

什么是LLMs?

LLMs是在大量文本数据上训练出来的神经网络。训练过程允许模型学习文本中的模式,包括语法、句法和单词关联。这些模型使用这些学习的模式来生成类似人类的文本,使它们成为自然语言处理(NLP)任务的理想选择。

哪些LLMs是可用的?

目前有几种LLM,其中GPT-4是最受欢迎的。其他模型包括LLaMA、PaLM、BERT和T5。每个模型都有其优点和缺点,有些是开放的,有些是封闭的,只能通过API来使用。

LLMs的缺点

尽管LLMs的性能令人印象深刻,但它们有几个局限性。一个重要的缺点是它们不能超越提示信息的推理。此外,LLM可以根据它们所训练的数据产生有偏见的文本。控制LLM的输出也很有挑战性,因此有必要使用提示工程策略来实现预期的输出。

可以执行哪些任务?

我们可以通过对每个案例的提示进行格式化,指导LLM为我们执行特定的任务。下面是这些任务的清单,以及每个任务的提示模板和例子。

文本总结

LLMs可以生成长篇文字的摘要,使之更容易理解和消化内容。我们可以要求用以下提示模板对一篇文章进行总结:

模板

<Full text>
Summarize the text above: / Explain the text above in <N> sentences:

例如,我们可以要求用下面的提示来总结这段文字:

提示

In a mechanical watch the watch's gears are turned by a spiral spring called a mainspring. In a manual watch energy is stored in the mainspring by turning a knob, the crown on the side of the watch. Then the energy from the mainspring powers the watch movement until it runs down, requiring the spring to be wound again.
A self-winding watch movement has a mechanism which winds the mainspring using the natural motions of the wearer's body. The watch contains an oscillating weight that turns on a pivot. The normal movements of the watch in the user's pocket (for a pocketwatch) or on the user's arm (for a wristwatch) cause the rotor to pivot on its staff, which is attached to a ratcheted winding mechanism. The motion of the watch is thereby translated into circular motion of the weight which, through a series of reverser and reducing gears, eventually winds the mainspring. There are many different designs for modern self-winding mechanisms. Some designs allow winding of the watch to take place while the weight swings in only one direction while other, more advanced, mechanisms have two ratchets and wind the mainspring during both clockwise and anti-clockwise weight motions.
Explain the text above in 2 sentences:

这就给出了以下输出:

输出

A mechanical watch uses a mainspring to power the movement of the watch's gears. In a manual watch, the mainspring is wound by turning a knob, while a self-winding watch uses the natural motion of the wearer's body to wind the mainspring through an oscillating weight and ratcheted winding mechanism.

信息提取

LLMs可以识别和提取大量文本中的特定信息。这在新闻和商业情报等领域很有用。下面的模板指示该模型提取信息:

模板

<Full text>
Mention X in the text above: / List X in the text above:

例如,我们可以要求从以下文本中提取地名:

提示

The earliest credible evidence for a successful design is the watch made by the Swiss watchmaker Abraham-Louis Perrelet, who lived in Le Locle. In late 1776 or early 1777, he invented a self-winding mechanism for pocket watches using an oscillating weight inside the watch that moved up and down. The Geneva Society of Arts, reporting on this watch in 1777, stated that 15 minutes walking was necessary to fully wind the watch.
List the names of places in the text above:

而输出的结果是

输出

Le Locle, Geneva.

问题回答

LLM也可以根据提示中提供的信息来回答问题。下面的模板可以用来指导LLM回答问题:

模板

Answer the question based on the context below. <Other instructions>. Respond "Not sure" if not sure about the answer.
Context: <Context>
Q: <Question>
A:

例如,用下面的提示:

Prompt

Answer the question based on the context below. Answer in a single sentence. Respond "Not sure" if not sure about the answer.
Context: An automatic watch, also known as a self-winding watch or simply an automatic, is a mechanical watch where the natural motion of the wearer provides energy to wind the mainspring, making manual winding unnecessary if worn enough. It is distinguished from a manual watch in that a manual watch must have its mainspring wound by hand at regular intervals.
Q: Do automatic watches need to be manually wound?
A:

而ChatGPT的回答是::

输出

No, automatic watches do not need to be manually wound if worn enough because the natural motion of the wearer provides energy to wind the mainspring.

文本分类

LLM可以将文本分类到预定义的类别中,如情感分析或垃圾邮件检测。你可以对文本分类指令进行如下格式化:

模板

Classify the text into <List of choices>:
Text: <Full text>
Classification:

例如,我们可以尝试对一个文本进行分类,看它的情绪是积极的、中性的还是消极的:

提示

Classify the text into positive, neutral or negative:
Text: That shot selection was awful
Classification:

输出

Negative

对话

LLM可以产生类似人类的对话,使其在聊天机器人和客户服务应用中非常有用。我们可以指示LLM以特定的方式跟随对话行动。

模板

The following is a conversation with <Actor>. The <Actor> should answer <Further instructions>.
Person: <Conversation>
<Actor>: <Response>
Person: <Conversation>
<Actor>:

LLM会填写我们指示的行为者类型的回应,我们可以通过将整个对话与新的回应再次传递给它,继续进行对话。比如说

提示

The following is a conversation with a lawyer. The lawyer should answer simply and in layman terms.
Person: Hello, I would like to know how to dispute a parking ticket
Lawyer:

输出

Sure, I can help. The first step is to check the ticket for any errors or mistakes. If you find any, you can use those as grounds to dispute the ticket. If there are no mistakes, you can still dispute the ticket by providing evidence to support your case, such as photos or witness statements. You can also request a hearing to present your case in person.

提示

The following is a conversation with a lawyer. The lawyer should answer simply and in layman terms.
Person: Hello, I would like to know how to dispute a parking ticket
Lawyer: Sure, I can help. The first step is to check the ticket for any errors or mistakes. If you find any, you can use those as grounds to dispute the ticket. If there are no mistakes, you can still dispute the ticket by providing evidence to support your case, such as photos or witness statements. You can also request a hearing to present your case in person.
Person: How do I request a hearing?
Lawyer:

输出

To request a hearing, you'll need to follow the instructions on the back of the ticket or on the website provided. Usually, you'll need to submit a written request within a certain timeframe, along with any evidence or supporting documents you have. Once your request is received, you'll be notified of the date, time, and location of your hearing.

代码生成

LLM可以生成代码,使开发人员更容易创建复杂的软件程序。我们可以写一个代码注释,描述我们希望代码做什么,而LLM将能够尽其所能为我们完成这个任务。我们希望代码做的事情越复杂,我们就需要在提示注释中写得越具体,越有描述性。

根据我们想要生成的编程语言,模板可能会有所不同,但我建议大致按照这个模板:

模板

<Comment symbol of your language of choice (/* or ''' or #)>
Method: <Name of the method>
Language: <Programming language>
Description: <Description of what the method must do>
Parameters: <Parameters that the method will receive>
Context: <Further information needed to implement the method (URLs, databases, etc.)>
Output: <Output of the method>
<Comment symbol of your language of choice (/* or ''' or #)>

例如,如果我们想获取一个带分页的API,我们可以写下面的注释:

提示

/* Method: fetchFirst5Pages
 * Language: Javascript
 * Description: Fetch the first 5 pages from this paginated API: http://example.com/pages/?n=x
 * Context: This API accepts the following URL parameters:
 *   n: the number of the page to fetch
 * Output: the content of the first 5 pages
 */

输出

Here's an example implementation of the fetchFirst5Pages method in JavaScript, assuming the use of the fetch function:

async function fetchFirst5Pages() {
  const baseUrl = 'http://example.com/pages/?n=';
  const pages = [];
  for (let i = 1; i <= 5; i++) {
    const url = baseUrl + i;
    const response = await fetch(url);
    if (!response.ok) {
      throw new Error(`Error fetching page ${i}`);
    }
    const pageData = await response.json();
    pages.push(pageData);
  }
  return pages;
}

This function uses a loop to fetch the first 5 pages from the API by appending the page number to the base URL. It checks if the response from the server is okay and throws an error if it's not. Then, it parses the JSON data from the response and adds it to an array of pages. Finally, it returns the array of pages.

推理

LLM在处理一些基本任务时可能会有困难,比如算术问题。然而,如果我们要求LLM对其响应进行推理,我们可以帮助它得到正确的答案。这是我推荐的模板:

模板

Q: <Question>
A: Let's work this out in a step by step way to be sure we have the right answer. First, <Instructions>.

在这个例子中,我们可以看到一个简单的算术问题,如果我们在没有推理的情况下问它,就会失败:

提示

Answer the following question. Respond "Not sure" if not sure about the answer.
Q: The sum of prime numbers in the following list are a multiple of 3: [2, 4, 6, 7, 9, 14]
A: [Yes or No]

输出

No. The sum of prime numbers in the list [2, 4, 6, 7, 9, 14] is not a multiple of 3. Therefore, the answer is "No".

然而,如果我们指示LLM对其答案进行推理,我们会得到一个非常不同的结果:

提示

Q: The sum of prime numbers in the following list are a multiple of 3: [2, 4, 6, 7, 9, 14]
A: Let's work this out in a step by step way to be sure we have the right answer. First, we find the prime numbers in the list and add them together. Then we check whether the resulting number can be divided by 3.

输出

Sure, let's work this out step by step.
The prime numbers in the list are 2 and 7. So we can add them together:
2 + 7 = 9
The resulting number is 9. Now we need to check whether it is a multiple of 3. We can do this by checking if the remainder of 9 divided by 3 is equal to 0:
9 % 3 = 0
Since the remainder is 0, we know that 9 is indeed a multiple of 3.
Therefore, the answer is "Yes", the sum of prime numbers in the list [2, 4, 6, 7, 9, 14] is a multiple of 3.

策略

虽然这些模板可以帮助你榨取LLM的潜力来满足你的需求,但有时它们仍然会在某些复杂的任务中挣扎。我们已经开发了一些技术来帮助LLMs得到一个满意的答案。到目前为止,我们只使用了零散的提示,现在我们将对其进行定义。然而,我们可以使用更复杂的技术。

零枪提示法

在本文的所有例子中,我们都使用了零枪提示法。零点提示是指向LLM传递一个提示,其中只有我们希望它回答的问题,没有任何关于它在其他情况下应该如何回答的例子。例如,在前面的一个例子中,我们这样做:

提示

Classify the text into positive, neutral or negative:
Text: That shot selection was awful
Classification:

输出

Negative

这可能不足以让模型准确地回答我们所希望的内容。让我们看看我们有什么替代方案。

少量提示(Few-shot Prompting

与零次提示相反,在少数次提示中,我们在提出问题之前给LLM几个例子,告诉它应该如何回答。例如,假设我们想对一个文本的情感进行分类,就像前面的例子一样。但我们希望它每次都能以一种非常具体的格式给我们答案。我们可以像这样事先给它一些例子:

提示

Classify the text into positive, neutral or negative:
Text: Today the weather is fantastic
Classification: Pos
Text: The furniture is small.
Classification: Neu
Text: I don't like your attitude
Classification: Neg
Text: That shot selection was awful
Classification:

输出

Neg

思想链提示

对于更复杂的任务,我们可以将少量的提示与让模型推理其答案结合起来。我们可以用之前的例子来看看推理和少量提示是如何影响结果的:

提示

Q: The sum of prime numbers in the following list are a multiple of 3: [2, 4, 6, 7, 9, 14]
A: The prime numbers in the list are 2 and 7. Their sum is 9. 9 % 3 = 0. The answer is yes.
Q: The sum of prime numbers in the following list are a multiple of 3: [1, 2, 4, 7, 8, 10]
A: The prime numbers in the list are 1, 2 and 7. Their sum is 10. 10 % 3 = 1. The answer is no.
Q: The sum of prime numbers in the following list are a multiple of 3: [1, 2, 3, 4]
A: The prime numbers in the list are 1, 2 and 3. Their sum is 6. 6 % 3 = 0. The answer is yes.
Q: The sum of prime numbers in the following list are a multiple of 3: [5, 6, 7, 8]
A:

输出

The prime numbers in the list are 5 and 7. Their sum is 12. 12 % 3 = 0. The answer is yes.

最后说明

LLMs已经彻底改变了自然语言处理的领域,但是为了最大限度地发挥其潜力,提示工程是必不可少的。通过了解LLM可以执行的任务、它们的缺点以及各种提示工程策略,开发者可以利用LLM的力量来创造创新和有效的解决方案。在不久的将来,更多的策略和技术可能会被开发出来,所以请关注这一领域的进一步进展,以最大限度地发挥LLM的潜力。此外,随着LLMs继续变大,拥有数十亿的额外参数,很可能会有更多我们现在甚至无法想到的任务成为可能。想想看,使用这些新工具会有什么可能,它们在未来会为我们提供哪些用例,真是令人惊讶。

NLP AI ChatGPT 语言模型