Segment Anything for Stable Diffusion WebUISegment Anything

Segment Anything 是什么

官网：Segment Anything | Meta AI (segment-anything.com)

源码：GitHub - facebookresearch/segment-anything: The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Segment Anything 提供了一种新的图像分割模型。该模型支持输入点或者方框作为引导提示，可以生成图像中所有对象的 mask。它使用 11 million 图像和 1.1 billion masks 数据进行训练，具有很好的 zero-shot transfer 零样本迁移到新的图像分割和任务的能力。

通过 prompt 引导生成 mask

from segment_anything import SamPredictor, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)

生成图像中所有对象的 mask

from segment_anything import SamAutomaticMaskGenerator, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(<your_image>)

python 脚本，通过命令执行生成 mask

python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type <model_type> --input <image_or_folder> --output <path/to/output>

web demo demo 目录下面有一个简单的单页面 react app 。

SDwebUI 的 segment anything 插件

SDwebUI 插件地址：GitHub - continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI

该插件将 segment anything 与 stable diffusion webUI, controlNet, GroundingDINO 联系在一起，增强 controlNet 的语义分割，提供图像自动化标注和LoRA/LyCORIS 训练集创建功能。

模型

SAM 模型下载后，可以存放在 ${sd-webui}/models/sam 或者$ {sd-webui-segments-anything}/models/sam 两个其中一个目录中。 Meta AI 的 SAM模型：

Segement anything 在 SDwebUI 中的应用

常规分割

sd-webui-segment-anything 插件安装成功后，点击展开 Segment Anything 标签。

SAM Model 一栏选择使用的模型。
在图像一栏，选择一张图片填入。在单击鼠标左键选择需要 mask 对象，单击鼠标右键选择不需要的 mask 对象。
点击 Preview Segmentsation 生成分割图，默认生成三张可选的分割图。
Choose your favorite mask 一栏中选择需要使用的分割图。

语义分割

勾选 Enable GroundingDINO
选择 GroundingDINO Model，会自动从 huggingface 下载，也可以预下下载放入 ${sd-webui-segments-anything}/models/grounding-dino 目录下
GroundingDINO Detection Prompt 中填入提示词
第一次运行时，会自动下载下安装 GroundingDINO python 库和模型, 代码如下:
GitHub - IDEA-Research/GroundingDINO: Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
GitHub - IDEA-Research/Grounded-Segment-Anything: Grounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

dino_model_dir = os.path.join(scripts.basedir(), "models/grounding-dino")
dino_model_list = ["GroundingDINO_SwinT_OGC (694MB)", "GroundingDINO_SwinB (938MB)"]
dino_model_info = {
    "GroundingDINO_SwinT_OGC (694MB)": {
        "checkpoint": "groundingdino_swint_ogc.pth",
        "config": os.path.join(dino_model_dir, "GroundingDINO_SwinT_OGC.py"),
        "url": "https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth",
    },
    "GroundingDINO_SwinB (938MB)": {
        "checkpoint": "groundingdino_swinb_cogcoor.pth",
        "config": os.path.join(dino_model_dir, "GroundingDINO_SwinB.cfg.py"),
        "url": "https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swinb_cogcoor.pth"
    },
}

dino_install_issue_text = "submit an issue to https://github.com/IDEA-Research/Grounded-Segment-Anything/issues."


def install_goundingdino():
    import launch
    if launch.is_installed("groundingdino"):
        return True
    try:
        launch.run_pip(
            f"install git+https://github.com/IDEA-Research/GroundingDINO",
            f"sd-webui-segment-anything requirement: groundingdino")
        print("GroundingDINO install success.")
        return True
    except Exception:
        import traceback
        print(traceback.print_exc())
        print(f"GroundingDINO install failed. Please {dino_install_issue_text}")
        return False

github.com 访问问题，会出现下面提示，可以尝试科学上网。

RuntimeError: Couldn't install sd-webui-segment-anything requirement: groundingdino.
Command: "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\Scripts\python.exe" -m pip install git+https://github.com/IDEA-Research/GroundingDINO --prefer-binary
Error code: 1
stdout: Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting git+https://github.com/IDEA-Research/GroundingDINO
  Cloning https://github.com/IDEA-Research/GroundingDINO to c:\users\leoli\appdata\local\temp\pip-req-build-xlhx4a4j

stderr:   Running command git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j'
  fatal: unable to access 'https://github.com/IDEA-Research/GroundingDINO/': Recv failure: Connection was reset
  error: subprocess-exited-with-error

  git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j' did not run successfully.
  exit code: 128

  See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j' did not run successfully.
exit code: 128

See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

None
GroundingDINO install failed. Please submit an issue to https://github.com/IDEA-Research/Grounded-Segment-Anything/issues.

GroundingDIN O依赖 bert 模型对提示词进行文本编码, 所以会下载模型 bert-base-uncased 到 hub 缓存目录 ${User}.cache\huggingface\hub

Installing sd-webui-segment-anything requirement: groundingdino
GroundingDINO install success.
Running GroundingDINO Inference
Initializing GroundingDINO GroundingDINO_SwinT_OGC (694MB)
final text_encoder_type: bert-base-uncased
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 6.78kB/s]
Downloading (…)lve/main/config.json: 100%|███████████████████████████████████████████████████| 570/570 [00:00<?, ?B/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████████████████| 232k/232k [00:00<00:00, 262kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████| 466k/466k [00:01<00:00, 460kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████| 440M/440M [16:10<00:00, 454kB/s]

GroundingDINO 运行结果,将 mask 的对象使用红框标识

ControlNet 中应用

生成图像的 mask 图后，则可以将其应用在 SDWebUI 的图片局部重绘中。

在 settings 中，勾选 ControlNet 可以被其他扩展控制的选项。
从三个 mask 图中选择一个，序号从 0 开始。并且勾选 copy to inpaint upload&controlnet 选项

启用 controlnet，选择预处理器和模型
- v1.0: 预处理: seg_ufade20k、seg_ofade20k 和 seg_ofcoco 来自 ControlNet 标注器。我强烈推荐使用 seg_ofade20k 和 seg_ofcoco 中的一个，因为它们的性能远远优于 seg_ufade20k。模型 :control_sd15_seg
- v1.1: 预处理：inpaint_global_harmonious, 模型 control_v11p_sd15_inpaint
选择重绘 mask 部分，还是非 mask 部分

点击生成按钮

Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_ade20k
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/250_16_swin_l_oneformer_ade20k_160k.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\250_16_swin_l_oneformer_ade20k_160k.pth

Error running process: D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\scripts\controlnet.py

Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_coco
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/150_16_swin_l_oneformer_coco_100ep.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\150_16_swin_l_oneformer_coco_100ep.pth

  7%|█████                                                                        | 59.5M/906M [00:11<04:43, 3.13MB/s]

Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_ade20k
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/250_16_swin_l_oneformer_ade20k_160k.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\250_16_swin_l_oneformer_ade20k_160k.pth

 36%|████████████████████████████▏                                                 | 327M/906M [00:51<01:20, 7.56MB/s]
 Loading config D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\annotator\oneformer\configs/ade20k/oneformer_swin_large_IN21k_384_bs16_160k.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Loading config D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\annotator\oneformer\configs/ade20k\Base-ADE20K-UnifiedSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Error completing request
Arguments: ('task(kg8umwmgczfx1xn)', 2, 'best quality,masterpiece,illustration, an extremely delicate and beautiful,extremely detailed,CG,unity,8k wallpaper,', '', [], None, None, {'image': <PIL.Image.Image image mode=RGBA size=1280x1280 at 0x2B913EF2740>, 'mask': <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1280x1280 at 0x2B913EF1EA0>}, None, None, None, None, 20, 0, 4, 0, 1, False, False, 1, 1, 7, 1.5, 0.75, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, 1, 32, 1, '', '', '', [], 0, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.external_code.ControlNetUnit object at 0x000002B8ECC7A050>, True, False, 0, <PIL.Image.Image image mode=RGBA size=1280x1280 at 0x2B913EF2710>, [{'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmph_ofl_bd.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmph_ofl_bd.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpdc0iyrjc.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpdc0iyrjc.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpc7r2khan.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpc7r2khan.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpt0edro13.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpt0edro13.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpel5j068y.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpel5j068y.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpkahp437p.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpkahp437p.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp2gni45dl.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp2gni45dl.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpy8cqxps6.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpy8cqxps6.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp832amwg6.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp832amwg6.png', 'is_file': True}], 0, False, [], [], False, 0, 1, False, False, 0, None, [], -2, False, [], '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False, 0, None, False, 50) {}
Traceback (most recent call last):
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\img2img.py", line 172, in img2img
    processed = process_images(p)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 503, in process_images
    res = process_images_inner(p)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 594, in process_images_inner
    p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 1056, in init
    self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
    h = self.encoder(x)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 523, in forward
    hs = [self.conv_in(x)]
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
    return torch.nn.Conv2d_forward_before_lora(self, input)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same