from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests
device = "cpu"
processor = BlipProcessor.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large").to(device)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = Image.open(requests.get(url, stream=True).raw)
inputs = processor(img, return_tensors="pt").to(device)
out = model.generate(**inputs)
output = processor.decode(out[0], skip_special_tokens=True)
print(output)
在这里下载模型 https://huggingface.co/Salesforce/blip-image-captioning-large/tree/main pytorch 做推理的话,至少下载截图中的 5 个文件,地址就是 from_pretrained 指向的文件夹。
示例:
模型输出结果: two cats laying on a couch with remote controls on the back
模型输出结果: a close up of a table with numbers and symbols on it