82

I just installed CUDA in a notebook like this:

sudo apt-get install cuda

Like said here.

The compilation works just fine but when I try to run I got the followin problem:

CUDA error at file.cu:128 code=35(cudaErrorInsufficientDriver) "cudaStreamCreate(&(stream[i]))" 

My nvcc version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Graphics card info:

lspci | egrep 'VGA|3D'
00:02.0 VGA compatible controller: Intel Corporation Skylake Integrated Graphics (rev 06)
02:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)

I also installed VirtualGL, bumblebee-nvidia, primus, freeglut3-dev. Following this.

When I try to run something on bumblebee I got this: optirun glxspheres64

[   41.413478] [ERROR]Cannot access secondary GPU - error: Could not load GPU driver
[   41.413520] [ERROR]Aborting because fallback start is disabled.

nvidia driver not working.

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

It looks like the nvidia 375 version is instaled but I can't make it works.

whereis nvidia
nvidia: /usr/lib/nvidia /usr/share/nvidia /usr/src/nvidia-375-375.66/nvidia

And some driver info.

modinfo nvidia_375
filename:       /lib/modules/4.8.0-54-generic/updates/dkms/nvidia_375.ko
alias:          char-major-195-*
version:        375.66
supported:      external
license:        NVIDIA
srcversion:     68751AFD79A210CEFFB8758
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        
vermagic:       4.8.0-54-generic SMP mod_unload modversions 
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_TCEBypassMode:int
parm:           NVreg_UseThreadedInterrupts:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp

I think it can be some driver version problem:

dpkg -l | grep nvidia
ii  bumblebee-nvidia                            3.2.1-10                                      amd64        NVIDIA Optimus support using the proprietary NVIDIA driver
ii  nvidia-375                                  375.66-0ubuntu0.16.04.1                       amd64        NVIDIA binary driver - version 375.66
ii  nvidia-375-dev                              375.66-0ubuntu0.16.04.1                       amd64        NVIDIA binary Xorg driver development files
ii  nvidia-modprobe                             375.51-0ubuntu1                               amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-icd-375                       375.66-0ubuntu0.16.04.1                       amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                0.8.2                                         amd64        Tools to enable NVIDIA's Prime

What am I missing?

Rodolfo
  • 1,213

13 Answers13

44

You may want to install cuda toolkit. Using the following command to install it.

sudo apt install nvidia-cuda-toolkit

Once the installation is done, reboot the machine. nvidia-smi should work.

Jack Chan
  • 641
37

If your nvidia-smi failed to communicate but you've installed the driver so many times, check prime-select.

  1. Run prime-select query to get all possible options. You should see at least nvidia | intel.
  2. Choose prime-select nvidia.
  3. If it says nvidia is already selected, select a different one, e.g. prime-select intel, then switch back to nvidia prime-select nvidia
  4. Reboot and check nvidia-smi.
Phúc Lê
  • 479
25

I disabled the Secure Boot and it worked pretty fine.

@rod-smith aswered another question more specific explaining how to do it, basically is a setup config, but he also write a good article about how to do that in here.

Rodolfo
  • 1,213
18

The solution by Markus lead me to a better solution. So it has to do with Secure Boot, but it is not necessary to deactivate.

To fix the problem, just do 3 steps: Deactivate the Nvidia driver by choosing X.Org with the Additional Drivers tool, reboot, then activate the Nvidia driver, reboot and enroll the key in Secure Boot.

Usually when you activate the Nvidia driver with the Additional Drivers tool, you are asked for a (new) password for Secure Boot. After reboot, the PC jumps into Secure Boot settings and you are asked to enroll a new MOK key, which must be confirmed with that same password. Afterwards, the driver will get access to the Nvidia card and will work.

w-sky
  • 1,354
  • 1
  • 14
  • 24
14

since I cannot comment on @Rodolfo's answer above (not enough reputation), I am adding a new answer.

On my machine I had to configure Secure Boot accordingly to my OS. I have an ASUS mainboard running Ubuntu 18.04 and tried to install NVIDIA CUDA 10.1 Update 2 with the packaged NVIDIA driver. I faced the same issue as described above. As it turned out, Secure Boot was set to Windows UEFI mode. Changing it to Other OS fixed it for me.

markus
  • 159
3

In case you are looking for a solution for Google Cloud Platform, it is best to follow the advice of Google and only use recommended Ubuntu version (at the time of writing May 2020 use either 16.04 or 18.04, the new 20.04 is not yet supported) and follow the official instructions for installing CUDA support for Google Cloud VM here. This will give you the correct version of the driver that works with GCP VM. Then restart the instance with sudo reboot or from console.

If you are installing CUDA for GCP VM any other way you may still succeed but struggle with issues like "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" or some dependency problem.

PS! I will not copy the instructions here as they are prone to change any time, always refer to original GCP source for the latest working solution.

MF.OX
  • 159
  • 4
1

This is the complete solution that worked for me. https://www.murhabazi.com/install-nvidia-driver

don't forget to disable secure boot.

0

A lot of users have mentioned that they are unable to install the Nvidia-toolkit, and sudo apt install nvidia-cuda-toolkit doesn't work. Be sure to check that you are using the latest GCC compiler. Using an older GCC compiler like 4.9 will not be able to compile the Nvidia Cuda toolkit. Try installing after using the latest GCC compiler, such as v9.3.

0

In my case, I was missing kernel headers to build Nvidia DKMS. Headers install answer is here: https://askubuntu.com/a/75854/174094

0

This problem seems to have multiple causes which means that my answer probably cannot cover all cases. I was also using debian, but the question is here and the debian and ubuntu are quite similar from a kernel perspective and I want to bring this to the attention of nvidia, linux and GCP and their users and this feels like a good way to to do.

In my case this problem occurred on google cloud platform (GCP) because there was an unattended-upgrade of linux-image-cloud-amd64 to linux-image-5.10.0-29-cloud-amd64 (5.10.216-1) which is not compatible with the nvidia driver because it checks for non-GPL code using so-called "GPL" symbols. I found this out by trying to reinstall the nvidia driver and reading the error logs. See this nvidia forum post for something similar.

I fixed this problem by uninstalling linux-image-cloud-amd64 and linux-image-5.10.0-29-cloud-amd64 leaving just linux-image-5.10.0-28-cloud-amd64 then running update-grub. Other people might need to install linux-image-5.10.0-28-cloud-amd64 or equivalent themselves.

Having to hold back the kernel seems like a security issue but this feels like an argument between nvidia and the linux kernel team about what symbols they are allowed to link against and what code they need to release.

Att Righ
  • 285
0

I have encountered this problem. I tried to reinstall cuda driver so many times. In the end, I found my GPU card is not connected physically... LOL ... In case someone encounters the same situation, I would share my silly case here.

-1

For future readers:

I am on a virtual machine instance (Google Cloud Platform)

and I am following this gist to install Cuda and CuDNn on my VM.

I had to manually upload the CuDNn part. (Just putting it out there.)

Now, getting to the error:

I was having this issue but a complete restart of the instance did the job. And by complete restart I mean stopping the instance and turning it back on again.

I hope this helps someone.

-1

I was using driver version 470 in Ubuntu 20.04 (latest driver at time of writing).

I went to Software & Updates>Additional Drivers, then downgraded to nvidia-driver-460, clicked Apply, then rebooted.

After that I was able to to see the correct output from nvidia-smi again.