使用 BLIP 做图像描述

611 阅读1分钟
from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests

device = "cpu"
processor = BlipProcessor.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large").to(device)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = Image.open(requests.get(url, stream=True).raw)

inputs = processor(img, return_tensors="pt").to(device)

out = model.generate(**inputs)
output = processor.decode(out[0], skip_special_tokens=True)
print(output)

在这里下载模型 https://huggingface.co/Salesforce/blip-image-captioning-large/tree/main pytorch 做推理的话,至少下载截图中的 5 个文件,地址就是 from_pretrained 指向的文件夹。

image.png

示例:

image.png

模型输出结果: two cats laying on a couch with remote controls on the back

67d2b5beabf15812bee28956abe7648.jpg

模型输出结果: a close up of a table with numbers and symbols on it