NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Question

I just installed CUDA in a notebook like this:

sudo apt-get install cuda

Like said here.

The compilation works just fine but when I try to run I got the followin problem:

CUDA error at file.cu:128 code=35(cudaErrorInsufficientDriver) "cudaStreamCreate(&(stream[i]))"

My nvcc version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Graphics card info:

lspci | egrep 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06)
02:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)

I also installed VirtualGL, bumblebee-nvidia, primus, freeglut3-dev. Following this.

When I try to run something on bumblebee I got this: optirun glxspheres64

[   41.413478] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver
[   41.413520] [ERROR]Aborting because fallback start is disabled.

nvidia driver not working.

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

It looks like the nvidia 375 version is instaled but I can't make it works.

whereis nvidia
nvidia: /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-375-375.66/nvidia

And some driver info.

modinfo nvidia_375
filename:       /lib/modules/4.8.0-54-generic/updates/dkms/nvidia_375.ko
alias:          char-major-195-*
version:        375.66
supported:      external
license:        NVIDIA
srcversion:     68751AFD79A210CEFFB8758
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        
vermagic:       4.8.0-54-generic SMP mod_unload modversions 
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_TCEBypassMode:int
parm:           NVreg_UseThreadedInterrupts:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp

I think it can be some driver version problem:

dpkg -l | grep nvidia
ii  bumblebee-nvidia                            3.2.1-10                                      amd64        NVIDIA Optimus support using the proprietary NVIDIA driver
ii  nvidia-375                                  375.66-0ubuntu0.16.04.1                       amd64        NVIDIA binary driver - version 375.66
ii  nvidia-375-dev                              375.66-0ubuntu0.16.04.1                       amd64        NVIDIA binary Xorg driver development files
ii  nvidia-modprobe                             375.51-0ubuntu1                               amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-icd-375                       375.66-0ubuntu0.16.04.1                       amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                0.8.2                                         amd64        Tools to enable NVIDIA's Prime

What am I missing?

score 44 · Answer 1 · answered Mar 27 '19 at 22:33

44

You may want to install cuda toolkit. Using the following command to install it.

sudo apt install nvidia-cuda-toolkit

Once the installation is done, reboot the machine. nvidia-smi should work.

answered Mar 27 '19 at 22:33

Jack Chan

641

score 37 · Answer 2 · edited May 09 '19 at 08:06

If your nvidia-smi failed to communicate but you've installed the driver so many times, check prime-select.

Run prime-select query to get all possible options. You should see at least nvidia | intel.
Choose prime-select nvidia.
If it says nvidia is already selected, select a different one, e.g. prime-select intel, then switch back to nvidia prime-select nvidia
Reboot and check nvidia-smi.

score 25 · Answer 3 · answered Jun 20 '17 at 19:21

25

I disabled the Secure Boot and it worked pretty fine.

@rod-smith aswered another question more specific explaining how to do it, basically is a setup config, but he also write a good article about how to do that in here.

answered Jun 20 '17 at 19:21

Rodolfo

1,213

w-sky · Accepted Answer · 2020-09-03T01:41:52.793

The solution by Markus lead me to a better solution. So it has to do with Secure Boot, but it is not necessary to deactivate.

To fix the problem, just do 3 steps: Deactivate the Nvidia driver by choosing X.Org with the Additional Drivers tool, reboot, then activate the Nvidia driver, reboot and enroll the key in Secure Boot.

Usually when you activate the Nvidia driver with the Additional Drivers tool, you are asked for a (new) password for Secure Boot. After reboot, the PC jumps into Secure Boot settings and you are asked to enroll a new MOK key, which must be confirmed with that same password. Afterwards, the driver will get access to the Nvidia card and will work.

score 14 · Answer 5 · answered Sep 10 '19 at 23:41

since I cannot comment on @Rodolfo's answer above (not enough reputation), I am adding a new answer.

On my machine I had to configure Secure Boot accordingly to my OS. I have an ASUS mainboard running Ubuntu 18.04 and tried to install NVIDIA CUDA 10.1 Update 2 with the packaged NVIDIA driver. I faced the same issue as described above. As it turned out, Secure Boot was set to Windows UEFI mode. Changing it to Other OS fixed it for me.

score 3 · Answer 6 · answered May 20 '20 at 02:55

In case you are looking for a solution for Google Cloud Platform, it is best to follow the advice of Google and only use recommended Ubuntu version (at the time of writing May 2020 use either 16.04 or 18.04, the new 20.04 is not yet supported) and follow the official instructions for installing CUDA support for Google Cloud VM here. This will give you the correct version of the driver that works with GCP VM. Then restart the instance with sudo reboot or from console.

If you are installing CUDA for GCP VM any other way you may still succeed but struggle with issues like "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" or some dependency problem.

PS! I will not copy the instructions here as they are prone to change any time, always refer to original GCP source for the latest working solution.

score 1 · Answer 7 · answered Mar 01 '24 at 09:31

1

This is the complete solution that worked for me. https://www.murhabazi.com/install-nvidia-driver

don't forget to disable secure boot.

answered Mar 01 '24 at 09:31

Fatemeh Karimi

216

score 0 · Answer 8 · answered Aug 24 '20 at 01:14

A lot of users have mentioned that they are unable to install the Nvidia-toolkit, and sudo apt install nvidia-cuda-toolkit doesn't work. Be sure to check that you are using the latest GCC compiler. Using an older GCC compiler like 4.9 will not be able to compile the Nvidia Cuda toolkit. Try installing after using the latest GCC compiler, such as v9.3.

score 0 · Answer 9 · answered Apr 24 '24 at 06:36

0

In my case, I was missing kernel headers to build Nvidia DKMS. Headers install answer is here: https://askubuntu.com/a/75854/174094

answered Apr 24 '24 at 06:36

Eugene Gr. Philippov

119

score 0 · Answer 10 · answered May 15 '24 at 15:43

This problem seems to have multiple causes which means that my answer probably cannot cover all cases. I was also using debian, but the question is here and the debian and ubuntu are quite similar from a kernel perspective and I want to bring this to the attention of nvidia, linux and GCP and their users and this feels like a good way to to do.

In my case this problem occurred on google cloud platform (GCP) because there was an unattended-upgrade of linux-image-cloud-amd64 to linux-image-5.10.0-29-cloud-amd64 (5.10.216-1) which is not compatible with the nvidia driver because it checks for non-GPL code using so-called "GPL" symbols. I found this out by trying to reinstall the nvidia driver and reading the error logs. See this nvidia forum post for something similar.

I fixed this problem by uninstalling linux-image-cloud-amd64 and linux-image-5.10.0-29-cloud-amd64 leaving just linux-image-5.10.0-28-cloud-amd64 then running update-grub. Other people might need to install linux-image-5.10.0-28-cloud-amd64 or equivalent themselves.

Having to hold back the kernel seems like a security issue but this feels like an argument between nvidia and the linux kernel team about what symbols they are allowed to link against and what code they need to release.

score 0 · Answer 11 · answered Sep 05 '24 at 13:08

0

I have encountered this problem. I tried to reinstall cuda driver so many times. In the end, I found my GPU card is not connected physically... LOL ... In case someone encounters the same situation, I would share my silly case here.

answered Sep 05 '24 at 13:08

Moonlight Knight

411
4
4

score -1 · Answer 12 · answered May 12 '19 at 12:29

For future readers:

I am on a virtual machine instance (Google Cloud Platform)

and I am following this gist to install Cuda and CuDNn on my VM.

I had to manually upload the CuDNn part. (Just putting it out there.)

Now, getting to the error:

I was having this issue but a complete restart of the instance did the job. And by complete restart I mean stopping the instance and turning it back on again.

I hope this helps someone.

score -1 · Answer 13 · answered Oct 21 '21 at 23:38

I was using driver version 470 in Ubuntu 20.04 (latest driver at time of writing).

I went to Software & Updates>Additional Drivers, then downgraded to nvidia-driver-460, clicked Apply, then rebooted.

After that I was able to to see the correct output from nvidia-smi again.

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

13 Answers13

Linked