Windows10 WSL 中安装 vLLM
- 需求:个人电脑 Windows10 系统体验 vLLM 部署大语言模型,但是 vLLM 只能运行在 Linux 系统中。
- 思路:需要借助 Windows 系统的子系统 WSL 功能,优先使用 WSL2,因为 WSL2 完全运行在 Linux 内核中,功能支持完全可以省去很多麻烦,具体 Windows 版本及 WSL2 安装详见官方文档。
- 实验配置:Windows10 + 3070Ti
- 注意事项:
- WSL 子系统默认安装在 C 盘,如果 C 盘空间有限,后期可能导致读写异常;
- WSL 子系统最好从 Windows store 下载,避免部分发行版的兼容性问题;
- cuda-toolkit 尽量只保留一个版本,多个版本可能导致冲突。
参考资料
安装 cuda
Download Installer for Windows 10 x86_64
Download Installer for Linux WSL-Ubuntu 2.0 x86_64
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
配置环境变量
ls /usr/local/cuda*
nano ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc
nvidia-smi
nvcc --version
$CUDA_HOME/bin/nvcc --version
python -c "import torch;print([torch.__version__,torch.cuda.is_available(),torch.cuda.device_count(),torch.cuda.current_device()])"
安装 Miniconda 并创建环境 vllm
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh
Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.bfsu.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --set show_channel_urls yes
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm
测试 vllm
import os
from vllm import LLM, SamplingParams
from modelscope import snapshot_download
model_path = os.path.expanduser("~/models/tclf90/deepseek-r1-distill-qwen-7b-gptq-int4")
model = LLM(model=model_path,gpu_memory_utilization=0.95,max_model_len=8192)
sampling_params = SamplingParams(temperature=0.8, top_p=0.9)
convs = model.generate("你好,你是谁?", sampling_params)
print(convs[0].outputs[0].text)
作为服务启动 vllm
vllm serve ~/models/tclf90/deepseek-r1-distill-qwen-7b-gptq-int4
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
修改 vllm 服务模型名称
vllm serve ~/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--gpu-memory-utilization 0.95 \
--max-model-len 2048 \
--served-model-name deepseek-1.5b
http://localhost:8000/v1/models
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "deepseek-1.5b", "messages": [{"role": "user", "content": "hi"}]}'
异常处理
ERROR 04-21 19:50:20 [core.py:387] CUTLASS_FP8_SUPPORTED = cutlass_fp8_supported() ERROR 04-21 19:50:20 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^
...
ERROR 04-21 19:50:20 [core.py:387] vllm.third_party.pynvml.NVMLError_NotSupported: Not Supported
nvidia-smi --query-gpu=compute_cap --format=csv
WSL 系统移动到 D 盘
wsl --export <发行版名称> D:\wsl-backup.tar
wsl --unregister <发行版名称>
wsl --import <发行版名称> D:\WSL\Ubuntu D:\wsl-backup.tar --version 2
echo -e "[user]\ndefault=newuser" | sudo tee -a /etc/wsl.conf