openai api支持结构化输出

464 阅读3分钟

背景

8月6号,openai进行了一波更新,对于我们开发来说,最重要的功能就是支持结构化输出了。现在你在调用openai的api时候,可以确保严格的json schema格式,而不用担心返回的类型出错。

JSON Schema vs JSON Mode

大家常用的一般都是json mode或者json 对象,比如下面

{
  "name": "John Doe",
  "age": 25,
  "address": {
    "street": "123 Main St",
    "city": "New York",
    "state": "NY",
    "postalCode": "10001"
  },
  "hobbies": ["reading", "running"]
}

这个基本上是返回的结果,至于是不是我们想要的,不好确定。postalCode希望是整数,city限制在几个枚举之间。那么如何对json进行限制呢,这个就是Json Schema的作用了

{
  "$id": "https://example.com/complex-object.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Complex Object",
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "age": {
      "type": "integer",
      "minimum": 0
    },
    "address": {
      "type": "object",
      "properties": {
        "street": {
          "type": "string"
        },
        "city": {
          "type": "string"
        },
        "state": {
          "type": "string"
        },
        "postalCode": {
          "type": "string",
          "pattern": "\\d{5}"
        }
      },
      "required": ["street", "city", "state", "postalCode"]
    },
    "hobbies": {
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  },
  "required": ["name", "age"]
}

可以看到JSON Schema 多了很多的限制,比如类型,是否必须,枚举值等。完整的这里可以看到

如何使用

目前有2种模式,支持的模型也各不相同,一种是Function calling,一种是response_format

Function calling

这个简单,请求里面设置strict:true即可,当前所有模型都支持包括gpt-3.5-turbo-0613

例子如下:

POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
    },
    {
      "role": "user",
      "content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "query",
        "description": "Execute a query.",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "table_name": {
              "type": "string",
              "enum": ["orders"]
            }

      }
    }
  ]
}

下面是返回


{
  table_name: 'orders',
  columns: [
    'id',
    'status',
    'expected_delivery_date',
    'delivered_at',
    'shipped_at',
    'ordered_at'
  ],
  conditions: [
    { column: 'status', operator: '=', value: 'fulfilled' },
    {
      column: 'expected_delivery_date',
      operator: '>=',
      value: '2023-05-01'
    },
    {
      column: 'expected_delivery_date',
      operator: '<=',
      value: '2023-05-31'
    },
    { column: 'delivered_at', operator: '>', value: [Object] }
  ],
  order_by: 'asc'
}

response_format 模式

请求接口里面添加 response_format对象,然后type设置为json_schema,strict设置为true,这种比较麻烦,支持的模型也十分有限,仅限于 最新的4-o模型,比如gpt-4o-2024-08-06(出了个08-06模型)和gpt-4o-mini-2024-07-18

POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful math tutor."
    },
    {
      "role": "user",
      "content": "solve 8x + 31 = 2"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "math_response",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {

下面是返回


[
  {
    explanation: 'First, isolate the term with the variable by subtracting 3 from both sides.',
    output: '8x + 3 - 3 = 21 - 3'
  },
  { explanation: 'This simplifies to 8x = 18.', output: '8x = 18' },
  {
    explanation: 'Next, solve for x by dividing both sides by 8.',
    output: 'x = 18 / 8'
  },
  {
    explanation: 'Simplify the right side by dividing 18 by 8.',
    output: 'x = 2.25'
  }
]

限制

那么代价是什么呢?

  • 只允许一部分 JSON Schema:String、Number、Boolean、Object、Array、Enum、anyOf,不支持oneOf 和 allOf。这个基本上也够。
  • 所有字段都是必选的,不能可选。写法问题而已
  • 嵌套不能超过5层,不能超过100个属性
  • 一些保留字不能作为属性名,比如字符串类型不能用minLength、maxLength等。
  • 第一个带有新Schema的API响应将产生额外的延迟,后续会缓存,一般延迟不会超过 10 秒,但复杂的Schema可能需要长达一分钟的预处理时间。
  • 如果超过最长token,会失败,返回refusal字段。

争议

有人说限制严格的输出,会影响大模型的推理能力(arxiv.org/abs/2408.02… we observe a significant decline in LLMs' reasoning abilities under format restrictions。也有人说反而提高 (blog.dottxt.co/performance…)

这个就看个人的使用了