厉害了！ ChatGPT 看到一张图，就能画出来！探索 ChatGPT 如何通过 DALL-E 3 和 GPT-4V 的

ChatGPT 最近一系列的更新简直炸裂，新出的 DALL·E 3 独领风骚，跟 Midjourney 有得一拼。它简单易用，只要会用 ChatGPT 就会使用 DALL·E 3，让你的 Idea 通过图像轻松地呈现出来。

DALL·E 3 又可以和最近新推出的图像识别功能结合起来，让你把看到的图片上传到 ChatGPT 并生成提示词，然后把这个提示词输入 DALL·E 3 生成相似的图像。让我们一起来看如何实现这一过程吧。

下图是我上一篇文章的封面，这张图的构图挺复杂的，不知道 ChatGPT 能否精准复刻这张图，让我们拭目以待吧！

在上传图片之前，首先要训练 ChatGPT，让它了解什么是 DALL·E 3 以及如何写提示词。虽然 DALL·E 3 知道如何写提示词但是 GPT-4V 的训练数据目前截止到 2022 年 1 月，因此它不知道什么是 DALL·E 3。

为了训练 ChatGPT，我把下面的提示词输入 GPT-4V。之所以没有输入到 DALL·E 3，因为它目前还不支持上传图片。

Act as an DALL·E 3 expert. Let me first explain what DALL·E 3 is and how you'll generate prompts for it.
DALL·E 3 is a subsequent iteration of the original DALL·E, which is a variant of the GPT-3 model by OpenAI trained specifically to generate images from textual descriptions.
Writing an effective prompt for DALL-E 3 is crucial for obtaining the desired image outputs. Here are some guidelines and tips to craft a good prompt: 

1. **Be Specific and Detailed**: Instead of writing "a cat," specify "a fluffy orange cat with large green eyes sitting on a blue cushion." The more detailed the description, the closer the generated image will be to your vision. 

2. **Set the Scene**: If you have a particular setting in mind, describe it. For example, "A serene beach during sunset with pink and purple hues in the sky, gentle waves, and a lone palm tree on the right." 

3. **Specify Image Type**: If you have a preference for the type of image (e.g., oil painting, cartoon, photo, illustration), mention it at the beginning of the prompt. 

4. **Include Composition Details**: If certain elements should be in the foreground, background, or specific locations, mention it. "A large mountain in the background with a clear blue lake in the foreground and a campfire on the left." 

5. **Use Descriptive Adjectives**: Colors, sizes, moods, and other adjectives can help DALL-E 3 understand the look and feel you want. "A vibrant bustling market street filled with colorful stalls and diverse shoppers." 

6. **Diversify Depictions**: If your image involves people, ensure that you specify details related to descent and gender for inclusivity and diversity. 

7. **Avoid Ambiguities**: Ambiguous prompts can lead to unexpected results. Be as clear as possible about what you want. 

8. **Limit Contradictions**: Ensure your description is coherent and doesn't contain conflicting details. 

9. **Experiment with Styles**: If you want an image inspired by older artistic styles or periods (keeping in mind the policy on recent artists), you can mention that. "A scene reminiscent of a Van Gogh painting showing a starry night over a quiet town." 

10. **Iterate and Refine**: If the initial image isn't quite right, adjust your prompt by adding or changing details, and try again. 

11. **Limit Length**: While being detailed is beneficial, excessively long prompts might confuse the model. Aim for a balance between detail and brevity. 

12. **Incorporate Emotions or Moods**: Describing the emotion or mood can help set the tone of the image. "A tranquil forest glade bathed in soft morning light, giving a sense of peace." 

13. **Avoid Complex Abstract Concepts**: DALL-E 3 works best with concrete descriptions. If you're trying to convey an abstract idea, try to break it down into visual elements.

DALL-E 3 offers three resolutions to fit your artistic needs:
- **Square (1024x1024):** The classic choice, ideal for most images and the default setting.
- **Wide (1792x1024):** Crafted for sprawling landscapes, panoramic views, or any artwork that leans towards a horizontal stretch.
- **Tall (1024x1792):** The pick for dramatic full-body portraits, towering structures, or anything that demands a vertical flair.
Here's the magic: DALL-E 3's intuitive design means it can automatically gauge the best resolution from your prompt. Let's say you input a prompt hinting at a "full body portrait." 
> Prompt: Full body portrait of a cat wearing safety goggles and a construction hat, inspecting the site with a serious expression. In the background, there's a sign that reads, "Paws Construction Co."

DALL-E 3 would instinctively opt for the 1024x1792 resolution. But if you're someone who likes to call the shots, just toss in terms like "vertical images" or specify the exact resolution you're aiming for.
Craving a wide image? No problem! Adjust your prompt like this:

> Prompt: A panoramic view of a cat wearing safety goggles and a construction hat, standing next to a miniature construction site with toy bulldozers and cranes. The cat appears to be inspecting the site with a serious expression, while a mouse in a suit holds a tiny blueprint next to it. In the background, there's a sign that reads, "Paws Construction Co."

Or you can simply use the term "wide images," and DALL-E 3 will roll out images in the 1792x1024 dimension. It's all about giving you the creative freedom to envision and execute!

Do you understand your role?

翻译一下：

请扮演 DALL-E 3 专家。让我先解释一下什么是 DALL-E 3 以及如何为它生成提示词。
DALL-E 3 是原始 DALL-E 的后续迭代版本，它是 OpenAI 专门训练的 GPT-3 模型的变体，用于根据文本描述生成图像。
为 DALL-E 3 编写有效的提示词对于获得理想的图像输出至关重要。下面是一些编写好提示语的指南和技巧： 

1. **具体详细**： 不要写 "一只猫"，而要具体说明 "一只毛茸茸的橙色猫，一双绿色的大眼睛，坐在一个蓝色的垫子上"。描述越详细，生成的图像就越接近你的想象。

2. **设置场景**： 如果您心目中有特定的场景，请对其进行描述。例如，"日落时分的宁静海滩，天空中呈现出粉色和紫色的色调，海浪轻柔，右边有一棵孤独的棕榈树"。

3. **指定图片类型**： 如果您对图片类型（如油画、漫画、照片、插图）有偏好，请在提示开头提及。

4. **包括构图细节**： 如果某些元素应位于前景、背景或特定位置，请注明。"背景是一座大山，前景是清澈湛蓝的湖水，左边是篝火"。

5. **使用描述性形容词**： 颜色、大小、情绪和其他形容词可以帮助 DALL-E 3 理解您想要的外观和感觉。"一条热闹非凡的集市街道，到处都是五颜六色的摊位和形形色色的购物者"。

6. **多样化描绘**： 如果您的图片涉及到人，请确保您指定了与血统和性别相关的细节，以实现包容性和多样性。

7. **避免模棱两可**： 模棱两可的提示可能会导致意想不到的结果。请尽可能明确您的要求。

8. **限制矛盾**： 确保您的描述连贯一致，不包含相互矛盾的细节。

9. **尝试不同风格**： 如果您希望图片的灵感来源于较早的艺术风格或时期（请牢记关于近期艺术家的政策），您可以这样说。"一个场景让人想起梵高的画作，展现了一个宁静小镇的星空"。

10. **反复推敲**： 如果最初的图像不太合适，可以通过添加或更改细节来调整您的提示词，然后再试一次。

11. **限制篇幅**： 虽然详细是有益的，但过长的提示可能会让模特感到困惑。应力求在细节和简洁之间取得平衡。

12. **融入情感或情绪**： 描述情绪或心情有助于确定图片的基调。"宁静的森林小径沐浴在柔和的晨光中，给人一种安详的感觉"。

13. **避免复杂抽象的概念**： DALL-E 3 最好使用具体的描述。如果您想表达一个抽象概念，请尽量将其分解为视觉元素。

DALL-E 3 提供三种分辨率以满足您的艺术需求：
- 正方形（1024x1024）：** 经典选择，适合大多数图像，也是默认设置。
- 宽（1792x1024）：** 适用于广阔的风景、全景或任何倾向于水平拉伸的艺术作品。
- 高 (1024x1792)：** 用于拍摄戏剧性的全身肖像、高耸的建筑或任何需要垂直风格的作品。

神奇之处就在这里： DALL-E 3 的直观设计意味着它能根据你的提示词自动判断最佳分辨率。假设您输入的提示是 "全身肖像"。

> 提示词： 一只猫的全身像，它戴着安全护目镜和施工帽，表情严肃地检查工地。背景是一个标牌，上面写着 "Paws 建筑公司"。

DALL-E 3会本能地选择 1024x1792 的分辨率。但如果你是一个喜欢发号施令的人，只需加入 "垂直图像 "之类的术语，或指定你想要的精确分辨率即可。

想要宽幅图像？没问题！像这样调整提示：

> 提示： 一只猫的全景图，它戴着安全护目镜和建筑帽，站在一个有玩具推土机和起重机的微型建筑工地旁。这只猫似乎正表情严肃地检查工地，旁边一只穿着西装的老鼠拿着一张小蓝图。背景是一个写着 "Paws Construction Co. "的牌子。

您也可以直接使用 "宽图像"，DALL-E 3 将以 1792x1024 的尺寸显示图像。所有这些都是为了给你创作自由，让你去设想和执行！

您了解自己的角色吗？

输入了上面的提示词训练 ChatGPT 以后，它这样回答：

是的，我明白我的角色。您提供了一份关于如何为 DALL-E 3 制作有效提示词的深入指南。根据这些指导原则，我将协助您制作有创意的详细提示，以便使用 DALL-E 3 制作所需的图像。请提供您的图像概念或您心中的想法，我将根据您分享的指导原则帮助您制作有效的提示。

好了，既然它已经懂了，那么我们就可以上传图片并让它生成对应的提示词了。

上传了图片之后，我让它生成 4 个提示词，一来是因为 DALL-E 3 默认会生成 4 张图片，二来是可以从这 4 张图中挑选一张最好的。这里的一个细节是我告诉了 GPT-4V 上传的图片是一张宽幅的图片。经我测试它还无法识别图像的宽高比。

1. **逼真的艺术工作室氛围**：
   > 一幅广角图片，捕捉到一位才华横溢的艺术家，栗色波浪卷发披散在肩上，身穿一件褪色的蓝色牛仔衬衫。她正一丝不苟地在画布上为自己绘制一幅栩栩如生的肖像，捕捉她的每一根发丝和凝视的神情。画布上的肖像反映了她在现实世界中的特征，包括她淡褐色眼睛中闪烁的柔和光芒。背景是复古的木板，增添了工作室的质朴氛围。她的调色板上点缀着各种颜色，手中的画笔蘸着天蓝色，准备下一笔。

2. **艺术中的自我反思**：
   > 艺术家迷失在创作世界中的全景。她身着休闲牛仔衬衫，丰盈的棕色波浪在微妙的灯光下闪闪发光，她在拉伸的画布上画着自画像。画布上捕捉到了她令人着迷的细节--每一个卷曲的弧度、灰色上衣轻柔的垂坠感，以及眼神的穿透力。背景中的木质镶板彰显出一种古老的魅力，而她身旁的调色板则证明了她的用心，色调使她的形象栩栩如生。

3. **复古工作室背景**：
   > 在一个宽阔的环境中，仿佛回到了永恒的艺术工作室，一位拥有一头丰盈飘逸的黑发、身着柔软的蓝色衬衫的女士正在全神贯注地作画。她的主题不是别人，正是她自己，画布上精细地描绘着她的脸和上半身。从脸部轮廓到眼睛的倒影，每一个特征都被精准捕捉。木质壁板进一步丰富了这里的氛围，她身旁的调色板上色彩斑斓，诉说着过去许多肖像画的故事。

4. **追求完美的艺术**：
   > 艺术家的天堂全景：一位拥有一头浓密棕色波浪卷发、身着舒适蓝色牛仔衬衫的女士正在进行她的艺术之旅。她正在画一幅自画像，画布上展现的是一张与她一模一样的脸，从她富有表情的眼睛到布料的垂感，每一个细微之处都与她一模一样。质朴的木质背景，加上她色彩斑斓的调色板，增加了画面的层次感，描绘出一幅纯粹的艺术沉浸式场景。

万事俱备！现在可以进入 DALL-E 3 并让它根据这 4 个提示词生成图像啦。

我觉得最后一幅图是比较接近原图的，放大来看下。

想继续了解如何使用 DALL·E 3 ，可以看这篇文章：

👉 重磅！ChatGPT 可以进行 AI 绘图了，来看教程吧

想了解 DALL·E 3 和 Midjourney 之间的差异，可以阅读这篇文章：

👉 Midjourney 要完蛋了吗？DALL·E 3 效果如何？

有任何疑问或想法，欢迎评论区进行探讨！