想要使用vllm
最好直接新环境pip install vllm
否则在安装xformers时容易出错。因为xformers的每个版本支持的cuda版本不一样。
失败案例:
我从torch2.1.0+cu118开始,注释了vllm的torch版本要求,然后安装xformers始终有问题。也可能是先安装了flash-attn2的问题。
一个成功安装的xformers.info如下(机器cuda版本118):
xFormers 0.0.22
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.3.0: available
memory_efficient_attention.flshattB@v2.3.0: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.0.1+cu117
pytorch.cuda: available
gpu.compute_capability: 8.0
gpu.name: NVIDIA A800 80GB PCIe
build.info: available
build.cuda_version: 1108
build.python_version: 3.11.5
build.torch_version: 2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.22
build.nvcc_version: 11.8.89
source.privacy: open source
虽然torch.version和build.torch_version不一样。
拓展
fschat vllm启动的模型在A800机器上会占用72G显存,和7B/13B没有关系