6

TLDR

I'm trying to get nvidia-smi back up, which was working fine until I installed cuda-toolkit. Uninstalling cuda-toolkit didn't help. How can I restore nvidia-smi output?

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.


More details

I've GEFORCE RTX 2070 on my laptop running Ubuntu 18.04 and had successfully installed its driver from the official runfile NVIDIA-Linux-x86_64-470.63.01.run. Here is the output of nvidia-smi from that installation:

enter image description here

Next, I installed cuda-toolkit from the official runfile cuda_11.4.2_470.57.02_linux.run, making sure to un-select driver installation. Here's the terminal window right after installation completed:

enter image description here

Right after, when I did nvidia-smi, I get:

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Since it was cuda-toolkit's installation that presumably "broke" nvidia-smi, I uninstalled cuda-toolkit (by running cuda-uninstaller found in /usr/local/cuda-11.4/bin, as stated in the generated text after installation).

Unfortunately, that doesn't help and nvidia-smi is still corrupted. The reason I'm installing from official NVIDIA runfiles is because previously I had issues installing driver from Ubuntu repositories, but could make it work with the official driver. So I figured I'd try the same with cuda-toolkit.

How can I get back nvidia-smi?

Outputs of some commands, if relevant

  • which nvidia-smi : /usr/bin/nvidia-smi
  • mokutil --sb-state : SecureBoot disabled
  • nvidia-settings :
    • ERROR: NVIDIA driver is not loaded
    • ERROR: Unable to load info from any available system
  • ls /sys/firmware/efi/ :
    • config_table efivars esrt fw_platform_size fw_vendor runtime runtime-map systab vars
  • lspci -k | grep -EA2 'VGA|3D' :

00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD Graphics] (rev 05)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel driver in use: i915

01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2070 Mobile / > Max-Q Refresh] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device 12ae
Kernel modules: nvidiafb, nouveau

  • cat /etc/modprobe.d/blacklist-nouveau.conf :

blacklist nouveau
blacklist vga16b
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
blacklist amd76_edac
alias nouveau off
alias lbm-nouveau off
options nouveau modeset=0

  • cat /proc/version :

    • Linux version 5.4.0-84-generic (buildd@lcy01-amd64-007) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #94~18.04.1-Ubuntu SMP Thu Aug 26 23:17:46 UTC 2021
  • sudo lshw -c video : (NVIDIA display is "unclaimed", but this is how it should be)

enter image description here

  • dkms status : no output
  • lsmod | grep nvidia :
    • i2c_nvidia_gpu 16384 0
  • echo $XDG_SESSION_TYPE : x11
  • whereis nvidia :
    • nvidia: /usr/lib/x86_64-linux-gnu/nvidia /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-470.63.01/nvidia
  • grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*:

/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nvidiafb
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf:# generated by nvidia-installer
/lib/modprobe.d/nvidia-runtimepm.conf:options nvidia "NVreg_DynamicPowerManagement=0x02"

Posts / Questions I've already looked at:

ubfan1
  • 19,049

2 Answers2

6

I purged all nvidia stuff and then tried sudo ubuntu-drivers autoinstall followed by sudo reboot after which nvidia-smi works fine.

enter image description here

So I guess the solution was to re-install NVIDIA drivers.

1

In my case, disabling Secure Boot nor prime-select nor rebooting don't help.

I have to install nvidia-dkms package via sudo apt install nvidia-dkms-YOUR-VERSION which is missing in the recommended installation method in the official doc (https://ubuntu.com/server/docs/nvidia-drivers-installation). The nvidia-dkms package is only mentioned in the manual installation method.