Chat completions Beta

Using the OpenAI Chat API, you can build your own applications with gpt-3.5-turbo and gpt-4 to do things like: 使用 OpenAI Chat API，您可以使用 gpt-3.5-turbo 和 gpt-4 构建自己的应用程序来执行以下操作：

Draft an email or other piece of writing 起草电子邮件或其他书面文件
Write Python code 编写 Python 代码
Answer questions about a set of documents 回答有关一组文件的问题
Create conversational agents 创建会话代理
Give your software a natural language interface 为您的软件提供自然语言界面
Tutor in a range of subjects 一系列科目的导师
Translate languages 翻译语言
Simulate characters for video games and much more 模拟视频游戏中的角色等等

This guide explains how to make an API call for chat-based language models and shares tips for getting good results. You can also experiment with the new chat format in the OpenAI Playground. 本指南解释了如何为基于聊天的语言模型进行 API 调用，并分享获得良好结果的技巧。您还可以在 OpenAI Playground 中试验新的聊天格式。

Introduction

Chat models take a series of messages as input, and return a model-generated message as output. 聊天模型将一系列消息作为输入，并返回模型生成的消息作为输出。

Although the chat format is designed to make multi-turn conversations easy, it’s just as useful for single-turn tasks without any conversations (such as those previously served by instruction following models like text-davinci-003). 尽管聊天格式旨在使多轮对话变得简单，但它对于没有任何对话的单轮任务同样有用（例如以前由指令遵循模型提供服务的任务，如 text-davinci-003 ）。

An example API call looks as follows: 示例 API 调用如下所示：

# Note: you need to be using OpenAI Python v0.27.0 for the code below to work
import openai

openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"},
        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
        {"role": "user", "content": "Where was it played?"}
    ]
)

The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either "system", "user", or "assistant") and content (the content of the message). Conversations can be as short as 1 message or fill many pages. 主要输入是消息参数。消息必须是一个消息对象数组，其中每个对象都有一个角色（“系统”、“用户”或“助手”）和内容（消息的内容）。对话可以短至 1 条消息或填满许多页面。

Typically, a conversation is formatted with a system message first, followed by alternating user and assistant messages. 通常，对话首先使用系统消息进行格式化，然后是交替的用户和助理消息。

The system message helps set the behavior of the assistant. In the example above, the assistant was instructed with "You are a helpful assistant." 系统消息有助于设置助手的行为。在上面的例子中，助手被指示“你是一个有用的助手”。

gpt-3.5-turbo-0301 does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages. gpt-3.5-turbo-0301 并不总是高度关注系统消息。未来的模型将被训练为更加关注系统消息。

The user messages help instruct the assistant. They can be generated by the end users of an application, or set by a developer as an instruction. 用户消息有助于指导助手。它们可以由应用程序的最终用户生成，或由开发人员设置为指令。

The assistant messages help store prior responses. They can also be written by a developer to help give examples of desired behavior. 助手消息帮助存储先前的响应。它们也可以由开发人员编写，以帮助提供所需行为的示例。

Including the conversation history helps when user instructions refer to prior messages. In the example above, the user’s final question of "Where was it played?" only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied via the conversation. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way. 当用户指令引用先前的消息时，包括对话历史记录会有所帮助。在上面的示例中，用户的最后一个问题是“它在哪里播放？”仅在有关 2020 年世界大赛的先前消息的上下文中才有意义。由于模型对过去的请求没有记忆，因此必须通过对话提供所有相关信息。如果对话不适合模型的令牌限制，则需要以某种方式缩短它。

Response format

An example API response looks as follows: API 响应示例如下所示：

{
 'id': 'chatcmpl-6p9XYPYSTTRi0xEviKjjilqrWU2Ve',
 'object': 'chat.completion',
 'created': 1677649420,
 'model': 'gpt-3.5-turbo',
 'usage': {'prompt_tokens': 56, 'completion_tokens': 31, 'total_tokens': 87},
 'choices': [
   {
    'message': {
      'role': 'assistant',
      'content': 'The 2020 World Series was played in Arlington, Texas at the Globe Life Field, which was the new home stadium for the Texas Rangers.'},
    'finish_reason': 'stop',
    'index': 0
   }
  ]
}

In Python, the assistant’s reply can be extracted with response['choices'][0]['message']['content']. 在 Python 中，可以使用 response['choices'][0]['message']['content'] 提取助手的回复。

Every response will include a finish_reason. The possible values for finish_reason are: 每个回复都将包含一个 finish_reason 。 finish_reason 的可能值为：

stop: API returned complete model output stop ：API 返回完整的模型输出
length: Incomplete model output due to max_tokens parameter or token limit length ：由于 max_tokens 参数或令牌限制，模型输出不完整
content_filter: Omitted content due to a flag from our content filters content_filter ：由于我们的内容过滤器中的标志而省略了内容
null: API response still in progress or incomplete null ：API 响应仍在进行中或未完成

Managing tokens

Language models read text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., a or apple), and in some languages tokens can be even shorter than one character or even longer than one word. 语言模型以称为标记的块形式读取文本。在英语中，记号可以短至一个字符或长至一个单词（例如 a 或 apple ），在某些语言中，记号甚至可以短于一个字符甚至长于一个单词。

For example, the string "ChatGPT is great!" is encoded into six tokens: ["Chat", "G", "PT", " is", " great", "!"]. 例如，字符串 "ChatGPT is great!" 被编码为六个标记： ["Chat", "G", "PT", " is", " great", "!"] 。

The total number of tokens in an API call affects: API 调用中的令牌总数会影响：

How much your API call costs, as you pay per token 您为每个令牌支付的 API 调用费用是多少
How long your API call takes, as writing more tokens takes more time 您的 API 调用需要多长时间，因为写入更多令牌需要更多时间
Whether your API call works at all, as total tokens must be below the model’s maximum limit (4096 tokens for gpt-3.5-turbo-0301) 您的 API 调用是否有效，因为令牌总数必须低于模型的最大限制（ gpt-3.5-turbo-0301 为 4096 个令牌）

Both input and output tokens count toward these quantities. For example, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens. 输入和输出令牌都计入这些数量。例如，如果您的 API 调用在消息输入中使用了 10 个令牌，而您在消息输出中收到了 20 个令牌，则您需要支付 30 个令牌的费用。

To see how many tokens are used by an API call, check the usage field in the API response (e.g., response['usage']['total_tokens']). 要查看 API 调用使用了多少令牌，请检查 API 响应中的 usage 字段（例如 response['usage']['total_tokens'] ）。

Chat models like gpt-3.5-turbo and gpt-4 use tokens in the same way as other models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation. gpt-3.5-turbo 和 gpt-4 等聊天模型使用令牌的方式与其他模型相同，但由于它们基于消息的格式，因此更难计算一次对话将使用多少令牌。

DEEP DIVE

Counting tokens for chat API calls 计算聊天 API 调用的令牌

To see how many tokens are in a text string without making an API call, use OpenAI’s tiktoken Python library. Example code can be found in the OpenAI Cookbook’s guide on how to count tokens with tiktoken. 要在不调用 API 的情况下查看文本字符串中有多少个标记，请使用 OpenAI 的 tiktoken Python 库。示例代码可以在 OpenAI Cookbook 关于如何使用 tiktoken 计算令牌的指南中找到。

Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting. This may change slightly in the future. 传递给 API 的每条消息都会消耗内容、角色和其他字段中的令牌数量，外加一些额外的用于幕后格式化。这在未来可能会略有改变。

If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4096 tokens for gpt-3.5-turbo), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it. 如果对话中的标记过多而无法满足模型的最大限制（例如， gpt-3.5-turbo 的标记超过 4096 个），您将不得不截断、省略或以其他方式缩小文本，直到它适合为止。请注意，如果从消息输入中删除一条消息，模型将失去所有关于它的知识。

Note too that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens. 另请注意，很长的对话更有可能收到不完整的回复。例如，长度为 4090 个令牌的 gpt-3.5-turbo 对话将在仅 6 个令牌后将其回复切断。

Instructing chat models 指导聊天模型

Best practices for instructing models may change from model version to version. The advice that follows applies to gpt-3.5-turbo-0301 and may not apply to future models. 指导模型的最佳实践可能因模型版本而异。以下建议适用于 gpt-3.5-turbo-0301，可能不适用于未来的模型。

Many conversations begin with a system message to gently instruct the assistant. For example, here is one of the system messages used for ChatGPT: 许多对话以系统消息开始，以温和地指示助手。例如，这是用于 ChatGPT 的系统消息之一：

You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible. Knowledge cutoff: {knowledge_cutoff} Current date: {current_date} 你是 ChatGPT，OpenAI 训练的大型语言模型。尽可能简洁地回答。知识截止日期：{knowledge_cutoff} 当前日期：{current_date}

In general, gpt-3.5-turbo-0301 does not pay strong attention to the system message, and therefore important instructions are often better placed in a user message. 一般来说， gpt-3.5-turbo-0301 对系统消息的关注度不高，因此重要的说明往往放在用户消息中比较好。

If the model isn’t generating the output you want, feel free to iterate and experiment with potential improvements. You can try approaches like: 如果模型没有生成您想要的输出，请随意迭代并尝试潜在的改进。您可以尝试以下方法：

Make your instruction more explicit 让你的指示更明确
Specify the format you want the answer in 指定您想要答案的格式
Ask the model to think step by step or debate pros and cons before settling on an answer 在确定答案之前让模型逐步思考或讨论利弊

For more prompt engineering ideas, read the OpenAI Cookbook guide on techniques to improve reliability. 如需更及时的工程创意，请阅读有关提高可靠性的技术的 OpenAI Cookbook 指南。

Beyond the system message, the temperature and max tokens are two of many options developers have to influence the output of the chat models. For temperature, higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. In the case of max tokens, if you want to limit a response to a certain length, max tokens can be set to an arbitrary number. This may cause issues for example if you set the max tokens value to 5 since the output will be cut-off and the result will not make sense to users. 除了系统消息之外，温度和最大令牌是开发人员必须影响聊天模型输出的众多选项中的两个。对于温度，较高的值（如 0.8）将使输出更加随机，而较低的值（如 0.2）将使输出更加集中和确定。在 max tokens 的情况下，如果要将响应限制为特定长度，可以将 max tokens 设置为任意数字。这可能会导致问题，例如，如果您将最大标记值设置为 5，因为输出将被切断并且结果对用户没有意义。

Chat vs Completions 聊天与完成

Because gpt-3.5-turbo performs at a similar capability to text-davinci-003 but at 10% the price per token, we recommend gpt-3.5-turbo for most use cases. 由于 gpt-3.5-turbo 的性能与 text-davinci-003 相似，但每个令牌的价格低 10%，因此我们建议在大多数用例中使用 gpt-3.5-turbo 。

For many developers, the transition is as simple as rewriting and retesting a prompt. 对于许多开发人员来说，转换就像重写和重新测试提示一样简单。

For example, if you translated English to French with the following completions prompt: 例如，如果您使用以下完成提示将英语翻译成法语：

Translate the following English text to French: "{text}"

An equivalent chat conversation could look like: 等效的聊天对话可能如下所示：

[  {"role": "system", "content": "You are a helpful assistant that translates English to French."},  {"role": "user", "content": 'Translate the following English text to French: "{text}"'}]

Or even just the user message: 或者甚至只是用户消息：

[  {"role": "user", "content": 'Translate the following English text to French: "{text}"'}]

FAQ

Is fine-tuning available for gpt-3.5-turbo? gpt-3.5-turbo 可以进行微调吗？

No. As of Mar 1, 2023, you can only fine-tune base GPT-3 models. See the fine-tuning guide for more details on how to use fine-tuned models. 不可以。自 2023 年 3 月 1 日起，您只能微调基础 GPT-3 模型。有关如何使用微调模型的更多详细信息，请参阅微调指南。

Do you store the data that is passed into the API? 您是否存储传递到 API 中的数据？

As of March 1st, 2023, we retain your API data for 30 days but no longer use your data sent via the API to improve our models. Learn more in our data usage policy. 自 2023 年 3 月 1 日起，我们会将您的 API 数据保留 30 天，但不再使用您通过 API 发送的数据来改进我们的模型。在我们的数据使用政策中了解更多信息。

Adding a moderation layer 添加审核层

If you want to add a moderation layer to the outputs of the Chat API, you can follow our moderation guide to prevent content that violates OpenAI’s usage policies from being shown. 如果您想向聊天 API 的输出添加审核层，您可以按照我们的审核指南来防止显示违反 OpenAI 使用政策的内容。

OpenAI 双语文档参考 Chat completions Beta