使用 BLIP 做图像描述下载 BLIP 模型文件，并使用 BLIP 模型做图像描述，需要注意要下载哪些文件到本地，并传

from transformers import BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import requests

device = "cpu"
processor = BlipProcessor.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large")
model = BlipForConditionalGeneration.from_pretrained("/home/xxx/.cache/huggingface/hub/models--Salesforce--blip-image-captioning-large").to(device)

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = Image.open(requests.get(url, stream=True).raw)

inputs = processor(img, return_tensors="pt").to(device)

out = model.generate(**inputs)
output = processor.decode(out[0], skip_special_tokens=True)
print(output)

在这里下载模型 https://huggingface.co/Salesforce/blip-image-captioning-large/tree/main pytorch 做推理的话，至少下载截图中的 5 个文件，地址就是 from_pretrained 指向的文件夹。

示例：

模型输出结果： two cats laying on a couch with remote controls on the back

模型输出结果： a close up of a table with numbers and symbols on it