任务概述
在上一个任务中,我们对基准模型进行了逐行精读,并学习了如何使用AI工具来提高我们的学习效率。此外,我们还制作了一个话剧连环画,并对零代码文生图平台Secpter WebUI有了初步的理解。今天,我们将深入了解微调的基本原理,了解微调的各种参数,并介绍一个高度定制化的文生图工作流平台工具——ComfyUI。
Part 1: 初探ComfyUI及其应用场景
1.1 什么是ComfyUI?
ComfyUI是一个基于节点的图形用户界面,主要用于图像生成技术。它的特别之处在于采用了模块化的设计,将图像生成的过程分解为多个小步骤,每个步骤都是一个节点。这些节点可以连接起来形成一个完整的工作流程,用户可以根据自己的需求定制这个流程。
1.2 ComfyUI的核心模块
- 模型加载器:加载AI模型。
- 提示词管理器:管理用于引导图像生成的文本提示。
- 采样器:处理图像生成过程中的采样。
- 解码器:将生成的数据转换回图像格式。
1.3 Stable Diffusion的采样参数
在Stable Diffusion中,采样过程可以通过KSampler中的参数来配置:
- seed:控制噪声产生的随机种子。
- control_after_generate:控制每次生成后seed的变化。
- steps:降噪的迭代步数,更多的步数意味着更精确的结果,但也意味着更长的生成时间。
- cfg (Classifier-Free Guidance):决定了文本提示对最终生成图像的影响程度。更高的值代表更多地遵循提示中的描述。
- denoise:表示有多少内容会被噪声覆盖。
- sampler_name 和 scheduler:这些参数控制降噪的具体方法。
1.4 ComfyUI的优势
- 模块化和灵活性:用户可以通过拖放不同的模块来构建复杂的工作流程。
- 可视化界面:使得用户能够更清晰地理解和操作复杂的AI模型和数据流。
- 多模型支持:支持多个不同的生成模型。
- 调试和优化:使得调试生成过程变得更加简单。
- 开放性和可扩展性:作为一个开源项目,具有高度的可扩展性。
- 用户友好性:即使对于复杂任务,也能以相对简单的方式完成。
2. 快速安装ComfyUI
为了安装ComfyUI,首先需要下载安装脚本和之前微调完成的LoRA文件:
git lfs install
git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git
mv kolors_test_comfyui/* ./
rm -rf kolors_test_comfyui/
mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/
mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/
接着进入安装文件目录,运行安装程序。最后,访问安装程序输出的链接以预览界面。
3. 浅尝ComfyUI工作流
3.1 不带LoRA的工作流样例
第一步是下载工作流脚本,并将其加载到ComfyUI上。
{
"last_node_id": 15,
"last_link_id": 18,
"nodes": [
{
"id": 11,
"type": "VAELoader",
"pos": [
1323,
240
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
12
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"sdxl.vae.safetensors"
]
},
{
"id": 10,
"type": "VAEDecode",
"pos": [
1368,
369
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 18
},
{
"name": "vae",
"type": "VAE",
"link": 12,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
13
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 14,
"type": "KolorsSampler",
"pos": [
1011,
371
],
"size": {
"0": 315,
"1": 222
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 16
},
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"link": 17
}
],
"outputs": [
{
"name": "latent",
"type": "LATENT",
"links": [
18
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsSampler"
},
"widgets_values": [
1024,
1024,
1000102404233412,
"fixed",
25,
5,
"EulerDiscreteScheduler"
]
},
{
"id": 6,
"type": "DownloadAndLoadKolorsModel",
"pos": [
201,
368
],
"size": {
"0": 315,
"1": 82
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
16
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadKolorsModel"
},
"widgets_values": [
"Kwai-Kolors/Kolors",
"fp16"
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
1366,
468
],
"size": [
535.4001724243165,
562.2001106262207
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 13
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 12,
"type": "KolorsTextEncode",
"pos": [
519,
529
],
"size": [
457.2893696934723,
225.28656056301645
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"link": 14,
"slot_index": 0
}
],
"outputs": [
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"links": [
17
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsTextEncode"
},
"widgets_values": [
"cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",
"",
1
]
},
{
"id": 15,
"type": "Note",
"pos": [
200,
636
],
"size": [
273.5273818969726,
149.55464588512064
],
"flags": {},
"order": 2,
"mode": 0,
"properties": {
"text": ""
},
"widgets_values": [
"Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "DownloadAndLoadChatGLM3",
"pos": [
206,
522
],
"size": [
274.5334274291992,
58
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"links": [
14
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadChatGLM3"
},
"widgets_values": [
"fp16"
]
}
],
"links": [
[
12,
11,
0,
10,
1,
"VAE"
],
[
13,
10,
0,
3,
0,
"IMAGE"
],
[
14,
13,
0,
12,
0,
"CHATGLM3MODEL"
],
[
16,
6,
0,
14,
0,
"KOLORSMODEL"
],
[
17,
12,
0,
14,
1,
"KOLORS_EMBEDS"
],
[
18,
14,
0,
10,
0,
"LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1,
"offset": {
"0": -114.73954010009766,
"1": -139.79705810546875
}
}
},
"version": 0.4
}
第二步是加载模型并完成第一次图像生成。首次生成图片可能会加载资源,耗时较长。
3.2 带LoRA的工作流样例
LoRA文件是我们在Task1中微调得到的。可以将其他LoRA文件替换到指定位置:
/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt
{
"last_node_id": 16,
"last_link_id": 20,
"nodes": [
{
"id": 11,
"type": "VAELoader",
"pos": [
1323,
240
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
12
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"sdxl.vae.safetensors"
]
},
{
"id": 10,
"type": "VAEDecode",
"pos": [
1368,
369
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 18
},
{
"name": "vae",
"type": "VAE",
"link": 12,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
13
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 15,
"type": "Note",
"pos": [
200,
636
],
"size": {
"0": 273.5273742675781,
"1": 149.5546417236328
},
"flags": {},
"order": 1,
"mode": 0,
"properties": {
"text": ""
},
"widgets_values": [
"Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "DownloadAndLoadChatGLM3",
"pos": [
206,
522
],
"size": {
"0": 274.5334167480469,
"1": 58
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"links": [
14
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadChatGLM3"
},
"widgets_values": [
"fp16"
]
},
{
"id": 6,
"type": "DownloadAndLoadKolorsModel",
"pos": [
201,
368
],
"size": {
"0": 315,
"1": 82
},
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
19
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadKolorsModel"
},
"widgets_values": [
"Kwai-Kolors/Kolors",
"fp16"
]
},
{
"id": 12,
"type": "KolorsTextEncode",
"pos": [
519,
529
],
"size": {
"0": 457.28936767578125,
"1": 225.28656005859375
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"link": 14,
"slot_index": 0
}
],
"outputs": [
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"links": [
17
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsTextEncode"
},
"widgets_values": [
"鐜板疄锛岄暱鍙戯紝灏戝コ锛岄槼鍏変笅鑳屾櫙",
"",
1
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
1366,
469
],
"size": {
"0": 535.400146484375,
"1": 562.2001342773438
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 13
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 16,
"type": "LoadKolorsLoRA",
"pos": [
606,
368
],
"size": {
"0": 317.4000244140625,
"1": 82
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 19
}
],
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
20
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "LoadKolorsLoRA"
},
"widgets_values": [
"/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt",
2
]
},
{
"id": 14,
"type": "KolorsSampler",
"pos": [
1011,
371
],
"size": {
"0": 315,
"1": 266
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 20
},
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"link": 17
},
{
"name": "latent",
"type": "LATENT",
"link": null
}
],
"outputs": [
{
"name": "latent",
"type": "LATENT",
"links": [
18
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsSampler"
},
"widgets_values": [
1024,
1024,
0,
"fixed",
25,
5,
"EulerDiscreteScheduler",
1
]
}
],
"links": [
[
12,
11,
0,
10,
1,
"VAE"
],
[
13,
10,
0,
3,
0,
"IMAGE"
],
[
14,
13,
0,
12,
0,
"CHATGLM3MODEL"
],
[
17,
12,
0,
14,
1,
"KOLORS_EMBEDS"
],
[
18,
14,
0,
10,
0,
"LATENT"
],
[
19,
6,
0,
16,
0,
"KOLORSMODEL"
],
[
20,
16,
0,
14,
0,
"KOLORSMODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.2100000000000002,
"offset": {
"0": -183.91309381910426,
"1": -202.11110769225016
}
}
},
"version": 0.4
}
4. 自我学习资源
- 魔搭社区:在魔搭使用ComfyUI,玩转AIGC!
- ComfyUI官方地址:ComfyUI GitHub
- ComfyUI官方示范:ComfyUI Examples
- 基础工作流示范:ComfyUI Workflows 和 wyrde-comfyui-workflows
- 工作流分享网站:ComfyWorkflows
- GitHub仓库:ComfyUI-Workflows-ZHO
Part 2: LoRA微调
1. LoRA简介
LoRA (Low-Rank Adaptation) 是一种高效的微调技术,通过在预训练模型的关键层中添加低秩矩阵来实现。这些低秩矩阵具有较低维度的参数空间,在不改变模型整体结构的情况下进行微调。在训练过程中,只有这些新增的低秩矩阵被更新,而原始模型的大部分权重保持不变。
2. LoRA详解
2.1 Task2中的微调代码
import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \
--pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
--pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
--pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
--lora_rank 16 \
--lora_alpha 4.0 \
--dataset_path data/lora_dataset_processed \
--output_path ./models \
--max_epochs 1 \
--center_crop \
--use_gradient_checkpointing \
--precision "16-mixed"
"""
os.system(cmd.strip())
2.2 参数详情
- pretrained_unet_path: 指定预训练UNet模型的路径。
- pretrained_text_encoder_path: 指定预训练文本编码器的路径。
- pretrained_fp16_vae_path: 指定预训练VAE模型的路径。
- lora_rank: 设置LoRA的秩(rank),影响模型的复杂度和性能。
- lora_alpha: 设置LoRA的alpha值,控制微调的强度。
- dataset_path: 指定用于训练的数据集路径。
- output_path: 指定训练完成后保存模型的路径。
- max_epochs: 设置最大训练轮数。
- center_crop: 启用中心裁剪,用于图像预处理。
- use_gradient_checkpointing: 启用梯度检查点技术,节省显存。
- precision: 设置训练时的精度为混合16位精度(half precision)。
2.3 UNet、VAE和文本编码器的协作关系
- UNet:根据输入的噪声和文本条件生成图像。
- VAE:将输入数据映射到潜在空间,并从中采样以生成新图像。
- 文本编码器:将文本输入转换为模型可以理解的向量表示。
Part 3: 如何准备一个高质量的数据集
选择合适的数据集非常重要。以下是一些关键的参考维度:
-
明确你的需求和目标
- 关注应用场景:确定模型的应用场景。
- 关注数据类型:需要什么类型的图片?
- 关注数据量:考虑所需的数据量。
-
数据集来源整理
- 公开的数据平台:例如魔搭社区、ImageNet、Open Images、Flickr等。
- 使用API或爬虫获取:从图库网站抓取图片。
- 数据合成:利用图形引擎或特定软件生成合成数据。
- 数据增强:对于较小的数据集,可以通过旋转、翻转、缩放等方式进行数据增强。
- 购买或定制:对于特定领域的应用,可以从可靠渠道购买数据集。
恭喜你完成了全部知识教程!希望这些内容对你有所帮助。