Segment Anything 是什么
官网:Segment Anything | Meta AI (segment-anything.com)
Segment Anything 提供了一种新的图像分割模型。该模型支持输入点或者方框作为引导提示,可以生成图像中所有对象的 mask。它使用 11 million 图像和 1.1 billion masks 数据进行训练,具有很好的 zero-shot transfer 零样本迁移到新的图像分割和任务的能力。
- 通过 prompt 引导生成 mask
from segment_anything import SamPredictor, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
predictor = SamPredictor(sam)
predictor.set_image(<your_image>)
masks, _, _ = predictor.predict(<input_prompts>)
- 生成图像中所有对象的 mask
from segment_anything import SamAutomaticMaskGenerator, sam_model_registry
sam = sam_model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
mask_generator = SamAutomaticMaskGenerator(sam)
masks = mask_generator.generate(<your_image>)
- python 脚本,通过命令执行生成 mask
python scripts/amg.py --checkpoint <path/to/checkpoint> --model-type <model_type> --input <image_or_folder> --output <path/to/output>
- web demo demo 目录下面有一个简单的单页面 react app 。
SDwebUI 的 segment anything 插件
SDwebUI 插件地址:GitHub - continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI
该插件将 segment anything 与 stable diffusion webUI, controlNet, GroundingDINO 联系在一起,增强 controlNet 的语义分割,提供图像自动化标注和LoRA/LyCORIS 训练集创建功能。
模型
SAM 模型下载后,可以存放在 {sd-webui-segments-anything}/models/sam 两个其中一个目录中。 Meta AI 的 SAM模型:
Segement anything 在 SDwebUI 中的应用
常规分割
sd-webui-segment-anything 插件安装成功后,点击展开 Segment Anything 标签。
- SAM Model 一栏选择使用的模型。
- 在图像一栏,选择一张图片填入。在单击鼠标左键选择需要 mask 对象,单击鼠标右键选择不需要的 mask 对象。
- 点击 Preview Segmentsation 生成分割图,默认生成三张可选的分割图。
- Choose your favorite mask 一栏中选择需要使用的分割图。
语义分割
-
勾选 Enable GroundingDINO
-
选择 GroundingDINO Model,会自动从 huggingface 下载,也可以预下下载放入 ${sd-webui-segments-anything}/models/grounding-dino 目录下
-
GroundingDINO Detection Prompt 中填入提示词
-
第一次运行时,会自动下载下安装 GroundingDINO python 库和模型, 代码如下:
dino_model_dir = os.path.join(scripts.basedir(), "models/grounding-dino")
dino_model_list = ["GroundingDINO_SwinT_OGC (694MB)", "GroundingDINO_SwinB (938MB)"]
dino_model_info = {
"GroundingDINO_SwinT_OGC (694MB)": {
"checkpoint": "groundingdino_swint_ogc.pth",
"config": os.path.join(dino_model_dir, "GroundingDINO_SwinT_OGC.py"),
"url": "https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth",
},
"GroundingDINO_SwinB (938MB)": {
"checkpoint": "groundingdino_swinb_cogcoor.pth",
"config": os.path.join(dino_model_dir, "GroundingDINO_SwinB.cfg.py"),
"url": "https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swinb_cogcoor.pth"
},
}
dino_install_issue_text = "submit an issue to https://github.com/IDEA-Research/Grounded-Segment-Anything/issues."
def install_goundingdino():
import launch
if launch.is_installed("groundingdino"):
return True
try:
launch.run_pip(
f"install git+https://github.com/IDEA-Research/GroundingDINO",
f"sd-webui-segment-anything requirement: groundingdino")
print("GroundingDINO install success.")
return True
except Exception:
import traceback
print(traceback.print_exc())
print(f"GroundingDINO install failed. Please {dino_install_issue_text}")
return False
github.com 访问问题,会出现下面提示,可以尝试科学上网。
RuntimeError: Couldn't install sd-webui-segment-anything requirement: groundingdino.
Command: "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\Scripts\python.exe" -m pip install git+https://github.com/IDEA-Research/GroundingDINO --prefer-binary
Error code: 1
stdout: Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting git+https://github.com/IDEA-Research/GroundingDINO
Cloning https://github.com/IDEA-Research/GroundingDINO to c:\users\leoli\appdata\local\temp\pip-req-build-xlhx4a4j
stderr: Running command git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j'
fatal: unable to access 'https://github.com/IDEA-Research/GroundingDINO/': Recv failure: Connection was reset
error: subprocess-exited-with-error
git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j' did not run successfully.
exit code: 128
See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
git clone --filter=blob:none --quiet https://github.com/IDEA-Research/GroundingDINO 'C:\Users\leoli\AppData\Local\Temp\pip-req-build-xlhx4a4j' did not run successfully.
exit code: 128
See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
None
GroundingDINO install failed. Please submit an issue to https://github.com/IDEA-Research/Grounded-Segment-Anything/issues.
- GroundingDIN O依赖 bert 模型对提示词进行文本编码, 所以会下载模型 bert-base-uncased 到 hub 缓存目录 ${User}.cache\huggingface\hub
Installing sd-webui-segment-anything requirement: groundingdino
GroundingDINO install success.
Running GroundingDINO Inference
Initializing GroundingDINO GroundingDINO_SwinT_OGC (694MB)
final text_encoder_type: bert-base-uncased
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 6.78kB/s]
Downloading (…)lve/main/config.json: 100%|███████████████████████████████████████████████████| 570/570 [00:00<?, ?B/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████████████████| 232k/232k [00:00<00:00, 262kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████| 466k/466k [00:01<00:00, 460kB/s]
Downloading model.safetensors: 100%|████████████████████████████████████████████████| 440M/440M [16:10<00:00, 454kB/s]
- GroundingDINO 运行结果,将 mask 的对象使用红框标识
ControlNet 中应用
生成图像的 mask 图后,则可以将其应用在 SDWebUI 的图片局部重绘中。
- 在 settings 中,勾选 ControlNet 可以被其他扩展控制的选项。
- 从三个 mask 图中选择一个,序号从 0 开始。并且勾选 copy to inpaint upload&controlnet 选项
-
启用 controlnet, 选择预处理器和模型
- v1.0: 预处理: seg_ufade20k、seg_ofade20k 和 seg_ofcoco 来自 ControlNet 标注器。我强烈推荐使用 seg_ofade20k 和 seg_ofcoco 中的一个,因为它们的性能远远优于 seg_ufade20k。模型 :control_sd15_seg
- v1.1: 预处理:inpaint_global_harmonious, 模型 control_v11p_sd15_inpaint
-
选择重绘 mask 部分,还是非 mask 部分
- 点击生成按钮
Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_ade20k
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/250_16_swin_l_oneformer_ade20k_160k.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\250_16_swin_l_oneformer_ade20k_160k.pth
Error running process: D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\scripts\controlnet.py
Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_coco
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/150_16_swin_l_oneformer_coco_100ep.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\150_16_swin_l_oneformer_coco_100ep.pth
7%|█████ | 59.5M/906M [00:11<04:43, 3.13MB/s]
Loading model from cache: control_sd15_seg [fef5e48e]
Loading preprocessor: oneformer_ade20k
Downloading: "https://huggingface.co/lllyasviel/Annotators/resolve/main/250_16_swin_l_oneformer_ade20k_160k.pth" to D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\models\oneformer\250_16_swin_l_oneformer_ade20k_160k.pth
36%|████████████████████████████▏ | 327M/906M [00:51<01:20, 7.56MB/s]
Loading config D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\annotator\oneformer\configs/ade20k/oneformer_swin_large_IN21k_384_bs16_160k.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Loading config D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions\sd-webui-controlnet\annotator\oneformer\configs/ade20k\Base-ADE20K-UnifiedSegmentation.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
Error completing request
Arguments: ('task(kg8umwmgczfx1xn)', 2, 'best quality,masterpiece,illustration, an extremely delicate and beautiful,extremely detailed,CG,unity,8k wallpaper,', '', [], None, None, {'image': <PIL.Image.Image image mode=RGBA size=1280x1280 at 0x2B913EF2740>, 'mask': <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=1280x1280 at 0x2B913EF1EA0>}, None, None, None, None, 20, 0, 4, 0, 1, False, False, 1, 1, 7, 1.5, 0.75, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, 1, 32, 1, '', '', '', [], 0, True, False, 1, False, False, False, 1.1, 1.5, 100, 0.7, False, False, True, False, False, 0, 'Gustavosta/MagicPrompt-Stable-Diffusion', '', False, False, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, 'LoRA', 'None', 1, 1, None, 'Refresh models', <scripts.external_code.ControlNetUnit object at 0x000002B8ECC7A050>, True, False, 0, <PIL.Image.Image image mode=RGBA size=1280x1280 at 0x2B913EF2710>, [{'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmph_ofl_bd.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmph_ofl_bd.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpdc0iyrjc.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpdc0iyrjc.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpc7r2khan.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpc7r2khan.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpt0edro13.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpt0edro13.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpel5j068y.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpel5j068y.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpkahp437p.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpkahp437p.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp2gni45dl.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp2gni45dl.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpy8cqxps6.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmpy8cqxps6.png', 'is_file': True}, {'name': 'C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp832amwg6.png', 'data': 'http://127.0.0.1:7860/file=C:\\Users\\leoli\\AppData\\Local\\Temp\\tmp832amwg6.png', 'is_file': True}], 0, False, [], [], False, 0, 1, False, False, 0, None, [], -2, False, [], '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', 0, '', 0, '', True, False, False, False, 0, None, False, 50) {}
Traceback (most recent call last):
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\call_queue.py", line 56, in f
res = list(func(*args, **kwargs))
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\img2img.py", line 172, in img2img
processed = process_images(p)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 503, in process_images
res = process_images_inner(p)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 594, in process_images_inner
p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\processing.py", line 1056, in init
self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.encode_first_stage(image))
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\sd_hijack_utils.py", line 17, in <lambda>
setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\modules\sd_hijack_utils.py", line 28, in __call__
return self.__orig_func(*args, **kwargs)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 830, in encode_first_stage
return self.first_stage_model.encode(x)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 83, in encode
h = self.encoder(x)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 523, in forward
hs = [self.conv_in(x)]
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\extensions-builtin\Lora\lora.py", line 319, in lora_Conv2d_forward
return torch.nn.Conv2d_forward_before_lora(self, input)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\workspace\project\stablediffusion\sdwebui\sdwebui-src\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same