windows stable diffusion 微调 lora 训练

471 阅读5分钟

介绍

Lora 全称 low-Rank Adaptation of Large Language Models,直译为大语言模型的低阶适应。这是微软的研究人员为了解决大语言模型微调而开发的。比如,GPT-3有1750亿参数,为了让它能干特定领域的活儿,需要做微调,但是如果直接对GPT-3做微调,成本太高太麻烦了。 LoRA的做法是,冻结预训练好的模型权重参数,然后在每个Transformer(Transforme就是GPT的那个T)块里注入可训练的层,由于不需要对模型的权重参数重新计算梯度,所以,大大减少了需要训练的计算量。 研究发现,LoRA的微调质量与全模型微调相当,要做个比喻的话,就好比是大模型的一个小模型,或者说是一个插件。 根据显卡性能不同,训练一般需要一个到几个小时的时间,这个过程俗称炼丹!

环境准备

训练脚本

  1. sd-scripts GitHub - kohya-ss/sd-scripts 提供 stable diffusion 下的多中训练方法的脚本:
  • DreamBooth 训练,包含 U-Net 和 Text Encoder
  • Fine-tuning(native training), 包含 U-Net 和 Text Encoder
  • LoRA 训练
  • Texutl Inversion 训练
  • Image generation
  • Model 转换,支持 stable diffusion 1.x 和 2.x ,cpkt, safetensors, diffusers.
  1. kohya_ss GitHub - bmaltais/kohya_ss 提供基于 Gradio 封装 sd-scripts 的图形界面。
  2. sd-webui GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI sd-webui 自带的 lora 训练功能,应该也是封装 sd-scripts ,只支持 sd 1.x
  3. lora-scripts github.com/Akegarasu/l… 秋叶杏封装的 sd-scripts

安装脚本

这里我使用带图形界面的 kohya_ss

  • 源码
git clone --depth 1 https://github.com/bmaltais/kohya_ss.git
  • 依赖

Visual Studio 2015, 2017, 2019, and 2022 redistributable

运行源码中脚本

.\setup.bat
Do you want to uninstall previous versions of torch and associated files before installing? Usefull if you are upgrading from torch 1.12.1 to torch 2.0.0 or if you are downgrading from torch 2.0.0 to torch 1.12.1.
[1] - Yes
[2] - No (recommanded for most)
Enter your choice (1 or 2): 2
Please choose the version of torch you want to install:
[1] - v1 (torch 1.12.1) (Recommended)
[2] - v2 (torch 2.0.0) (Experimental)
Enter your choice (1 or 2): 1
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://download.pytorch.org/whl/cu116

image.png

可选的 cudnn

This step is optional but can improve the learning speed for NVIDIA 30X0/40X0 owners. It allows for larger training batch size and faster training speed.

Due to the file size, Can not host the DLLs needed for CUDNN 8.6 on Github. You can download them for a speed boost in sample generation (almost 50% on 4090 GPU) you can download them here.

To install, simply unzip the directory and place the cudnn_windows folder in the root of the this repo.

Run the following commands to install:

.\venv\Scripts\activate

python .\tools\cudann_1.8_install.py

Once the commands have completed successfully you should be ready to use the new version.

  • 配置 accelerate config
 (lora_py_3106) D:\workspace\project\stablediffusion\kohya_ss>accelerate config
------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine
------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?
No distributed training
Do you want to run your training on CPU only (even if a GPU is available)? [yes/NO]:NO
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO
Do you want to use DeepSpeed? [yes/NO]: NO
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all
------------------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?
fp16
accelerate configuration saved at C:\Users\leoli/.cache\huggingface\accelerate\default_config.yaml

运行脚本

The two scripts to launch the GUI on Windows are gui.ps1 and gui.bat in the root directory. You can use whichever script you prefer.

To launch the Gradio UI, run the script in a terminal with the desired command line arguments, for example:

gui.ps1 --listen 127.0.0.1 --server_port 7860 --inbrowser --share

or

gui.bat --listen 127.0.0.1 --server_port 7860 --inbrowser --share

System Information:
System: Windows, Release: 10, Version: 10.0.22621, Machine: AMD64, Processor: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel

Python Information:
Version: 3.10.6, Implementation: CPython, Compiler: MSC v.1916 64 bit (AMD64)

Virtual Environment Information:
Path: D:\workspace\project\stablediffusion\kohya_ss\venv

GPU Information:
Name: NVIDIA GeForce RTX 3060, VRAM: 12288 MiB

Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

数据集准备

sd-scripts/train_README-zh.md at main · kohya-ss/sd-scripts · GitHub 根据训练数据的数量、训练目标和说明(图像描述)是否可用等因素,可以使用几种方法指定训练数据

  1. DreamBooth、class + identifier方式(可使用正则化图像) 将训练目标与特定单词(identifier)相关联进行训练。无需准备说明。例如,当要学习特定角色时,由于无需准备说明,因此比较方便,但由于学习数据的所有元素都与identifier相关联,例如发型、服装、背景等,因此在生成时可能会出现无法更换服装的情况。

  2. DreamBooth、说明方式(可使用正则化图像) 准备记录每个图像说明的文本文件进行训练。例如,通过将图像详细信息(如穿着白色衣服的角色A、穿着红色衣服的角色A等)记录在说明中,可以将角色和其他元素分离,并期望模型更准确地学习角色。

  3. 微调方式(不可使用正则化图像)

如果您想要学习LoRA、Textual Inversion而不需要准备简介文件,则建议使用DreamBooth class+identifier。如果您能够准备好,则DreamBooth Captions方法更好。如果您有大量的训练数据并且不使用规则化图像,则请考虑使用fine-tuning方法。

对于DreamBooth也是一样的,但不能使用fine-tuning方法。对于fine-tuning方法,只能使用fine-tuning方式。

模型测试

LoRA 通过 train_network.py 进行支持,且通过不同名字区分类型。

  1. LoRA-LierLa :(LoRA for Linear Layers) LoRA for Linear layers and Conv2d layers with 1x1 kernel.
  2. LoRA-C3Lier :(LoRA for Colutional layers with 3x3 Kernel and Linear Layers) In addition to 1. LoRA for Conv2d layers with 3x3 kernel.

LoRA-LierLa 是 train_network.py 默认支持的 LoRA 类型,LoRA-LierLa 可以由 sd-webui 自带的 LoRA 功能和 kohya-ss 提供的插件使用。

LoRA-C3Lier 则只能由 kohya-ss 提供的插件使用。

插件地址: GitHub - kohya-ss/sd-webui-additional-networks