详细指南！基于diffusers训练LoRA生成商业产品图教你几秒生成一张商业产品图！⛱️今天这里只记录一下用LoRA微

引言

🤔最近在自己学习AI绘画原理和实战，感觉网上的干货都零零碎碎的，自己每次微调完都没有及时总结，下次又要再重新开始，学习效率低，所以写一篇博客记录一下如何用代码训练LoRA。看了很多教程，大多数都是使用stable diffusion webui来训练LoRA，但对于AI算法开发者来说，还是要懂得怎么在本地代码训练LoRA的⛽️。

📑经过粗糙地尝试过几遍LoRA微调，我发现能影响效果的因素太多太多了，归纳起来分为四类：数据、预训练模型（基底模型）、训练参数、推理参数。

数据：对图片的预处理手段以及打标签用的方法
预训练模型（基底模型）：这个很重要！包括sdv1-5，sdv2-1，sdxl等等，以及网站上各种大佬微调出来的模型也可以作为预训练模型
训练参数：目前还是用官方给出来的训练参数，主要就是调整一下training_steps,效果还行
推理参数：这个就比较复杂了，需要自己作多次尝试将较好的参数记录下来

未命名文件 (1).png

⛱️今天这里只记录一下用LoRA微调一个比较通用的大致流程，以后再更新更多在实践当中积累到的经验呀。这里我们就以生成香水产品图为例子，过一遍流程~

1. LoRA简介（快速略过）

LoRA是一种轻量级的模型微调技术，通过在预训练模型上添加低秩矩阵，而非修改原有权重，来实现模型的快速适应。这种方法的优点在于：

计算效率：仅更新少量额外参数，大大减少了内存消耗和计算成本。
泛化能力：LoRA能帮助模型更好地泛化到新任务上，尤其是在数据量有限的情况下。

2. 准备工作📖

本次训练中，我们主要采用了来自diffusers库里的train_text_to_image_lora_sdxl.py脚本，以sdxl作为预训练模型进行微调。下载链接如下：

预训练模型下载：点这里
代码链接：点这里

训练过程会占24g显存！所以必须要有显卡喔~ 基础环境，可由自己电脑环境选择版本

PyTorch
Transformers
Diffusers

3. 数据集准备 📊

为了训练一个用于产品图生成的模型，你需要一个高质量的产品图像数据集。数据集应该包括各种产品类别的高分辨率图像。确保数据集已经清洗并标注好，以便模型学习不同产品的特征。

这里以爱马仕的香水--尼罗河花园为例，训练一个产品图LoRA。

1️⃣数据收集

首先在网上下载香水图的图片，筛选出8张，对这8张进行正方形的裁剪。数据集内除了要有图片，还要有一个metadata.jsonl, 存放数据的标签。

2️⃣打标签

可以用blip模型打标签，也可以自己手动打标签（数量少的话）。

Step1：先使用 blip 链接下的make_captions.py生成标签.

blip模型下载链接

Step2: 在每个生成的txt里的标填入识词改为 niluohe-garden perfume

Step3：生成metadata.jsonl, 格式如下

4. 微调lora🔥

使用diffusers库里的train_text_to_image_lora_sdxl.py 中的这个脚本，使用默认的训练命令。注意，最后两行修改了image_column和caption_column，分别对应上面metadata.jsonl里字典的键。

 python train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path='stabilityai_stable-diffusion-xl-base-1.0/' \
  --train_data_dir='data' \
  --resolution=1024 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=10 --checkpointing_steps=500 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --mixed_precision="fp16" \
  --seed=42 \
  --output_dir="sd-ni-luo-he-model-lora-sdxl" \
  --validation_prompt="niluohe-garden perfume" \
  --image_column "file_name" \
  --caption_column "caption"

大概半小时内就训完啦~

5. 结果展示☘️

diffusers官方文档给出了三种lora的推理方式，详情点这里

🔶第一种是加载LoRA进UNet和 Text Encoder中；

🔶第二种是只加载LoRA进UNet中；

🔶第三种是调整不同结构LoRA加载的权重

from diffusers import DiffusionPipeline
import torch

model_path = "sd-ni-luo-he-model-lora-sdxl"
pipe = DiffusionPipeline.from_pretrained("stabilityai_stable-diffusion-xl-base-1.0/", torch_dtype=torch.float16)
pipe.to("cuda")


prompt = "a bottle of niluohe-garden perfume, high quality, 8k"

## way 1
#pipe.load_lora_weights(model_path)

## way 2
#pipe.unet.load_attn_procs(model_path, weight_name="pytorch_lora_weights.safetensors")

## way3

pipe.load_lora_weights(model_path, adapter_name="my_adapter") 
scales = {
    "text_encoder": 0.5,
    "text_encoder_2": 0.5,  # only usable if pipe has a 2nd text encoder
    "unet": {
        "down": 0.9,  # all transformers in the down-part will use scale 0.9
        # "mid"  # in this example "mid" is not given, therefore all transformers in the mid part will use the default scale 1.0
        "up": {
            "block_0": 0.6,  # all 3 transformers in the 0th block in the up-part will use scale 0.6
            "block_1": [0.4, 0.8, 1.0],  # the 3 transformers in the 1st block in the up-part will use scales 0.4, 0.8 and 1.0 respectively
        }
    }
}
pipe.set_adapters("my_adapter", scales)
for i in range(4):
    # pass prompt and image to pipeline
    image = pipe(prompt, num_inference_steps=30, guidance_scale=7).images[0]

    image.save('perfume_'+str(i)+'.png')