vllm使用指南

2,398 阅读1分钟

想要使用vllm 最好直接新环境pip install vllm 否则在安装xformers时容易出错。因为xformers的每个版本支持的cuda版本不一样。

失败案例: 我从torch2.1.0+cu118开始,注释了vllmtorch版本要求,然后安装xformers始终有问题。也可能是先安装了flash-attn2的问题。

一个成功安装的xformers.info如下(机器cuda版本118):

xFormers 0.0.22
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@v2.3.0:        available
memory_efficient_attention.flshattB@v2.3.0:        available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
pytorch.version:                                   2.0.1+cu117
pytorch.cuda:                                      available
gpu.compute_capability:                            8.0
gpu.name:                                          NVIDIA A800 80GB PCIe
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.11.5
build.torch_version:                               2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.22
build.nvcc_version:                                11.8.89
source.privacy:                                    open source

虽然torch.versionbuild.torch_version不一样。

拓展

fschat vllm启动的模型在A800机器上会占用72G显存,和7B/13B没有关系