1

This is a question on a topic that has, in different variations, been asked already. However, since none of the answers I found was applicable to my problem, I will first outline the problem and then, in case anyone else finds themselves in the same spot, outline the answers I tried. Perhaps they work for you. In any case, I would be grateful for any new information on this issue.

Version: 16.04

Kernel: 4.15.0-133-generic

Since I wanted to use CUDA 11, I uninstalled my previous NVIDIA driver with

sudo apt --purge remove "*nvidia*"

as well as tried to remove everything from the previous CUDA versions via

sudo apt --purge remove "*cuda*" "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"

and

sudo apt-get autoremove .

I then installed the graphics driver and CUDA from command line as described in the nvidia page, as well as here. For a successful installation, this step needed to be performed in the terminal with Ctrl+Alt+F1. Also, the XServer needed to be stopped via sudo service lightdm stop (at least I think that's what it does). After the installation of both driver and the CUDA toolkit and rebooting the system, I ran the deviceQuery program as well as a simulation I wrote for CUDA succesfully. However, in the graphical interface I was stuck in a log-in loop (references to similar posts below).

Since none of the below listed remedies worked, I tried to install CUDA and the NVIDIA driver from the graphics-drivers ppa via sudo add-apt-repository ppa:graphics-drivers/ppa. After installing the appropriate driver via sudo apt-get install nvidia-460 and rebooting, I could access the graphical interface again. nvidia-smi shows a running nvidia driver:

    $ nvidia-smi
    Tue Feb 23 14:50:14 2021       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: N/A      |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Quadro P3000        Off  | 00000000:01:00.0  On |                  N/A |
    | N/A   50C    P0    23W /  N/A |    405MiB /  6078MiB |      2%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1322      G   /usr/lib/xorg/Xorg                260MiB |
|    0   N/A  N/A      2502      G   compiz                             49MiB |
|    0   N/A  N/A     32082      G   ...gAAAAAAAAA --shared-files       91MiB |
+-----------------------------------------------------------------------------+

On the other hand, no method of installing CUDA (either via the runfile but without a new installation of the driver, nor through sudo apt install nvidia-cuda-toolkit or sudo apt install cuda-toolkit-11-2) leads to a successful installation of CUDA. Programs compile via the nvcc without problems, however ./deviceQuery returns

$ ./deviceQuery 
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35 -> CUDA driver version is insufficient for CUDA runtime version Result = FAIL

and other programs terminate once CUDA-parts are reached. Note that the reason for failing (driver version is insufficient) is not correct, since the installed driver is 460.32.03, which is sufficient according to the nvidia manual. On the other hand, the nvidia-smi also doesn't seem to notice CUDA is installed. Currently, with the driver installed from the ppa and CUDA installed from the runfile, and I have

$ lspci -k | grep -EA3 'VGA|3D|Display'
00:02.0 VGA compatible controller: Intel Corporation Device 591b (rev 04)
    Subsystem: Lenovo Device 224c
    Kernel driver in use: i915
    Kernel modules: i915
--
01:00.0 3D controller: NVIDIA Corporation GP104GLM [Quadro P3000 Mobile] (rev a1)
    Subsystem: Lenovo Device 224c
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_460_drm, nvidia_460

I would be very grateful for any ideas on how to either make the driver installed via the runfile work together with the Xserver or to make the driver from the ppa work together with CUDA.

Thank you and best,

David

Now for some tried and failed solutions: With driver installed from runfile:

  1. try installing gdm instead of lightdm as specified here by WindowsEscapist
  2. make sure Xauthority user rights are with user.

with driver installed from ppa:graphics-drivers/ppa:

  1. try to run with sudo optirun ./deviceQuery as specified in this link
  2. try setting Prime profiles to NVIDIA in NVIDIA X server settings (already set)
  3. try using sudo prime-select nvidia as suggested in here

1 Answers1

0

Since the driver nvidia-460 offered in the version of Ubuntu I had first did not work with the cuda toolkit from the runfile from nvidias website, I did an hwe update. In the updated version of 16.04, it was not possible to me to install nvidia-driver-460 or nvidia-driver-450 so, I installed bionic (18.04) and then the nvidia-driver-450. As @ubfan1 pointed out, the rest of the answer is in this link, where the toolkit is installed via the runfile, but without the driver.