如何将多模态数据直接传递给模型：图文描述实例直接使用图像 URL 我们也可以直接在消息对象中传递图像 URL，而不是将其

# 如何将多模态数据直接传递给模型：图文描述实例

## 引言
在人工智能领域，能够处理多模态数据（如图像和文本）的模型变得越来越重要。本文将介绍如何将多模态输入（例如图像和文本）直接传递给模型，并以描述图片中的天气为例进行演示。我们将使用 OpenAI 模型，并探讨如何在代码中实现这一过程。

## 主要内容

### 1. 准备输入数据
对于多模态输入，常见的做法是将图像数据转换为字节字符串（base64 编码）。本文将展示如何获取图片并将其编码为字节字符串。

### 2. 创建消息对象
我们将使用 `HumanMessage` 类来创建消息对象，其中包含文本和图像 URL。

### 3. 调用模型
使用 `ChatOpenAI` 模型来处理消息对象，并返回描述图片中的天气的文本。

### 4. 使用工具调用
某些多模态模型支持工具调用功能。本文将演示如何绑定工具并调用它们，以处理更复杂的任务。

## 代码示例
以下是一个完整的代码示例，演示如何将多模态数据传递给模型，并描述图片中的天气。

```python
import base64
import httpx
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

# 设置图片 URL
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

# 初始化模型
model = ChatOpenAI(model="gpt-4o")

# 使用 API 代理服务提高访问稳定性
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

# 创建消息对象
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
    ],
)

# 调用模型并获取响应
response = model.invoke([message])
print(response.content)

直接使用图像 URL

我们也可以直接在消息对象中传递图像 URL，而不是将其编码为字节字符串。

# 创建消息对象，直接使用图像 URL
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)

# 调用模型并获取响应
response = model.invoke([message])
print(response.content)

传递多张图片

我们可以在消息对象中传递多张图片，并询问模型问题。

# 创建消息对象，传递多张图片
message = HumanMessage(
    content=[
        {"type": "text", "text": "are these two images the same?"},
        {"type": "image_url", "image_url": {"url": image_url}},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)

# 调用模型并获取响应
response = model.invoke([message])
print(response.content)

工具调用

我们可以绑定工具到模型，并使用工具调用来处理特定任务。

from typing import Literal
from langchain_core.tools import tool

# 定义天气工具
@tool
def weather_tool(weather: Literal["sunny", "cloudy", "rainy"]) -> None:
    """Describe the weather"""
    pass

# 将工具绑定到模型
model_with_tools = model.bind_tools([weather_tool])

# 创建消息对象
message = HumanMessage(
    content=[
        {"type": "text", "text": "describe the weather in this image"},
        {"type": "image_url", "image_url": {"url": image_url}},
    ],
)

# 调用模型并获取响应
response = model_with_tools.invoke([message])
print(response.tool_calls)

常见问题和解决方案

1. 网络限制

由于某些地区的网络限制，开发者可能需要考虑使用 API 代理服务以提高访问稳定性。

2. 图像格式支持

并非所有模型提供商都支持所有类型的图像格式。在使用特定模型时，应确保图像格式与该模型的要求兼容。

3. 数据预处理

在传递多模态数据时，确保图像和文本数据经过适当的预处理，以避免模型无法正确解析输入。

总结和进一步学习资源

本文介绍了如何将多模态数据直接传递给模型，并以描述图片中的天气为例进行了演示。我们使用了 OpenAI 的 GPT-4 模型，并讨论了常见问题和解决方案。想要深入了解更多内容，请参考以下资源：

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---