ubuntu install NVIDIA Container Toolkit

144 阅读1分钟

image.png

  1. 配置仓库

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    

如果需要更高级的特性,可以开启实验特性



sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list

更新

apt-get update

安装最新版本

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
  1. 容器运行时启用 NVIDIA Container Toolkit

  2. Configure the container runtime by using the nvidia-ctk command:

    sudo nvidia-ctk runtime configure --runtime=containerd
    

    The nvidia-ctk command modifies the /etc/containerd/config.toml file on the host. The file is updated so that containerd can use the NVIDIA Container Runtime.

    sudo systemctl restart containerd
    

测试

部署 gpu operator

image.png

参考:

1.  <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>
2.  <https://github.com/NVIDIA/nvidia-container-toolkit?tab=readme-ov-file>
3. operator 部署:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-deployment-scenarios