千问VL72B从下载到调用目标：使用vllm来部署千问VL 72B模型，并使用curl来完成调用。参考这篇教程来准

目标：使用vllm来部署千问VL 72B模型，并使用curl来完成调用。

参考这篇教程来准备docker镜像

也可以直接使用vllm的官方docker镜像，但启动的时候我遇到了一些问题，没有深入去解决。

编写一个bash脚本启动准备好的docker镜像：

#!/bin/bash                                                                                                                                                                                                                       
docker run --gpus all \                                                                                                                                                                                                                    
    -v /host/model/cache:/root/.cache \                                                                                                                                                                                                  
    --env "HF_ENDPOINT=https://hf-mirror.com" \                                                                                                                                                                                            
    -p 8000:8000 \                                                                                                                                                                                                                         
    -itd \                                                                                                                                                                                                                                 
    --ipc=host \                                                                                                                                                                                                                           
    my_cuda:12.4 /bin/bash

启动好容器之后，进入到容器中，直接使用vllm命令来启动服务即可：

vllm serve Qwen/Qwen2-VL-72B-Instruct --port 8000 --tensor-parallel-size 8

单卡无法运行72B的模型，我们这里使用8卡一起来跑，也可以使用4卡来运行

接口验证：

使用curl命令进行验证

curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2-VL-72B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}}, {"type": "text", "text": "What is the text in the illustrate?"} ]} ] }'

使用wrk命令进行验证

一共需要两个文件，一个request.lua，一个test.sh脚本文件

Lua脚本

wrk.method = "POST"                                                                                                                                                                                                                        
wrk.headers["Content-Type"] = "application/json"                                                                                                                                                                                           
wrk.body = [[                                                                                                                                                                                                                              
{                                                                                                                                                                                                                                          
    "model": "Qwen/Qwen2-VL-72B-Instruct",                                                                                                                                                                                                 
    "messages": [                                                                                                                                                                                                                          
        {"role": "system", "content": "You are a helpful assistant."},                                                                                                                                                                     
        {"role": "user", "content": [                                                                                                                                                                                                      
            {"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},                                                                                                               
            {"type": "text", "text": "What is the text in the illustrate?"}                                                                                                                                                                
        ]}                                                                                                                                                                                                                                 
    ]                                                                                                                                                                                                                                      
}                                                                                                                                                                                                                                          
]]

test脚本

#!/bin/bash                                                                                                                                                                                                                          
wrk -t16 -c20 -d7200s -s request.lua http://localhost:8000/v1/chat/completions