背景
8月6号,openai进行了一波更新,对于我们开发来说,最重要的功能就是支持结构化输出了。现在你在调用openai的api时候,可以确保严格的json schema格式,而不用担心返回的类型出错。
JSON Schema vs JSON Mode
大家常用的一般都是json mode或者json 对象,比如下面
{
"name": "John Doe",
"age": 25,
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY",
"postalCode": "10001"
},
"hobbies": ["reading", "running"]
}
这个基本上是返回的结果,至于是不是我们想要的,不好确定。postalCode希望是整数,city限制在几个枚举之间。那么如何对json进行限制呢,这个就是Json Schema的作用了
{
"$id": "https://example.com/complex-object.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Complex Object",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer",
"minimum": 0
},
"address": {
"type": "object",
"properties": {
"street": {
"type": "string"
},
"city": {
"type": "string"
},
"state": {
"type": "string"
},
"postalCode": {
"type": "string",
"pattern": "\\d{5}"
}
},
"required": ["street", "city", "state", "postalCode"]
},
"hobbies": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["name", "age"]
}
可以看到JSON Schema 多了很多的限制,比如类型,是否必须,枚举值等。完整的这里可以看到
如何使用
目前有2种模式,支持的模型也各不相同,一种是Function calling
,一种是response_format
Function calling
这个简单,请求里面设置strict:true
即可,当前所有模型都支持包括gpt-3.5-turbo-0613
例子如下:
POST /v1/chat/completions
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
},
{
"role": "user",
"content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "query",
"description": "Execute a query.",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"table_name": {
"type": "string",
"enum": ["orders"]
}
}
}
]
}
下面是返回
{
table_name: 'orders',
columns: [
'id',
'status',
'expected_delivery_date',
'delivered_at',
'shipped_at',
'ordered_at'
],
conditions: [
{ column: 'status', operator: '=', value: 'fulfilled' },
{
column: 'expected_delivery_date',
operator: '>=',
value: '2023-05-01'
},
{
column: 'expected_delivery_date',
operator: '<=',
value: '2023-05-31'
},
{ column: 'delivered_at', operator: '>', value: [Object] }
],
order_by: 'asc'
}
response_format 模式
请求接口里面添加 response_format
对象,然后type
设置为json_schema
,strict
设置为true
,这种比较麻烦,支持的模型也十分有限,仅限于
最新的4-o模型,比如gpt-4o-2024-08-06
(出了个08-06模型)和gpt-4o-mini-2024-07-18
。
POST /v1/chat/completions
{
"model": "gpt-4o-2024-08-06",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
下面是返回
[
{
explanation: 'First, isolate the term with the variable by subtracting 3 from both sides.',
output: '8x + 3 - 3 = 21 - 3'
},
{ explanation: 'This simplifies to 8x = 18.', output: '8x = 18' },
{
explanation: 'Next, solve for x by dividing both sides by 8.',
output: 'x = 18 / 8'
},
{
explanation: 'Simplify the right side by dividing 18 by 8.',
output: 'x = 2.25'
}
]
限制
那么代价是什么呢?
- 只允许一部分 JSON Schema:String、Number、Boolean、Object、Array、Enum、anyOf,不支持oneOf 和 allOf。这个基本上也够。
- 所有字段都是必选的,不能可选。写法问题而已
- 嵌套不能超过5层,不能超过100个属性
- 一些保留字不能作为属性名,比如字符串类型不能用minLength、maxLength等。
- 第一个带有新Schema的API响应将产生额外的延迟,后续会缓存,一般延迟不会超过 10 秒,但复杂的Schema可能需要长达一分钟的预处理时间。
- 如果超过最长token,会失败,返回refusal字段。
争议
有人说限制严格的输出,会影响大模型的推理能力(arxiv.org/abs/2408.02… we observe a significant decline in LLMs' reasoning abilities under format restrictions。也有人说反而提高 (blog.dottxt.co/performance…)
这个就看个人的使用了