NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.

710 阅读2分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

疫情啥时候结束呢???

问题描述:在宿舍隔离了一个月之后,打开ubuntu,使用sudo apt update更新了一下系统,结果运行cuda程序时,发现驱动找不见了,使用nvidia-smi查看显卡,也显示NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

嘚,我以为驱动又没了(其实是重装习惯了),在我拿出祖传驱动文件就要重装时,提示我,我已经安装了这个版本的驱动,你还要重新装吗,~~en,此事绝逼莫那么简单,我选择了不重装,一番搜索大佬们踩过的坑,我悟了,嘿。。。

解决方案

原因:我更新了系统内核,,而驱动还”记得“是以前的内核版本,所以两者不匹配了

查看正在使用的内核版本:

uname -a

我这里的的输出结果为:

Linux a504-701 5.4.0-107-generic #121~18.04.1-Ubuntu SMP Thu Mar 24 17:21:33 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

查看已安装的内核:

dpkg --get-selections |grep linux-image
# 输出如下:
# linux-image-5.4.0-100-generic			install
# linux-image-5.4.0-104-generic			install
# linux-image-5.4.0-107-generic			install
# linux-image-5.4.0-84-generic			deinstall
# linux-image-5.4.0-92-generic			deinstall
# linux-image-generic-hwe-18.04			install

输入如下命令:

sudo apt-get install dkms # 安装dkms
sudo dkms install -m nvidia -v 470.103.01 # 470.103.01为驱动版本号,按照自己的驱动版本改一下

输出如下:

Creating symlink /var/lib/dkms/nvidia/470.103.01/source ->
                 /usr/src/nvidia-470.103.01

DKMS: add completed.

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-107-generic IGNORE_CC_MISMATCH='' modules.......
cleaning build area...

DKMS: build completed.

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-107-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-107-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-107-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-107-generic/updates/dkms/

nvidia-peermem.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-107-generic/updates/dkms/

depmod...

DKMS: install completed.

之后,nvidia-smi查看驱动,成功