WSL + Docker Desktop + Linux容器运行 vLLM 报错记录RuntimeError("UVA

RuntimeError("UVA is not available")

如下图所示，这个报错是由于在 is_pin_memory_available 函数中判断如果是 in_wsl() 则直接返回 False。这里将 False 临时改为 True 进行修复。

RuntimeError: CUDA graphs must be captured on a non-default stream. (However, after capture, it's ok to replay them on the default stream.)

报错如下图所示：

这是因为在执行 cudagraph capture 时，使用了 default stream。需要临时进行如下修改：

diff --git a/vllm/compilation/cuda_graph.py b/vllm/compilation/cuda_graph.py
index 13e88448c..0293d2e5c 100644
--- a/vllm/compilation/cuda_graph.py
+++ b/vllm/compilation/cuda_graph.py
@@ -292,10 +292,18 @@ class CUDAGraphWrapper:
                 get_offloader().sync_prev_onload()

                 # mind-exploding: carefully manage the reference and memory.
+                # Ensure current stream is not the default stream.
+                # In WSL2, spawn subprocesses may not have TLS stream
+                # initialized before CUDA graph capture is triggered.
+                _capture_stream = current_stream()
+                if _capture_stream.stream_id == torch.cuda.default_stream().stream_id:
+                    import torch as _torch
+                    _torch.cuda.set_stream(_torch.cuda.Stream())
+                    _capture_stream = current_stream()
                 with torch.cuda.graph(
                     cudagraph,
                     pool=self.graph_pool,
-                    stream=current_stream(),
+                    stream=_capture_stream,
                 ):
                     # `output` is managed by pytorch's cudagraph pool
                     output = self.runnable(*args, **kwargs)