1

I've been trying all day to have this (v100) GPU working on a new ubuntu VM. I tried installing the drivers and rebooting and also purging/uninstalling everything to do with nvidia but none of these things seem to work.

In particular I ran this specifically:

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers sudo apt install ubuntu-drivers-common ubuntu-drivers devices sudo apt-get install nvidia-driver-460 sudo reboot now

Then sometimes it seems that nvidia-smi is working (as of the writing of this question it wasn't so I wasn't able to copy paste what is said when it works) but when it doesn't work it says this:

(synthesis) miranda9@miranda9:~$ nvidia-smi
Unable to determine the device handle for GPU 0000:00:06.0: Unknown Error

any help is appreciated.

Note I also do not have access to the VMs vmx file so this question and answers are useless/meaningless to me: https://forums.developer.nvidia.com/t/nvidia-smi-reports-unable-to-determine-the-device-handle-for-gpu/46835

In addition I have tried to uninstall everything from nivida and re-install it with:

sudo apt-get --purge remove "*nvidia*"
sudo /usr/bin/nvidia-uninstall

then

apt update;
apt install build-essential;

sudo add-apt-repository ppa:graphics-drivers sudo apt install ubuntu-drivers-common ubuntu-drivers devices sudo apt-get install nvidia-driver-460 sudo reboot now

but that doesnt seem to work


More info in case it helps:

(synthesis) miranda9@miranda9:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal

also:

(synthesis) miranda9@miranda9:~$ python
Python 3.9.5 (default, Jun  4 2021, 12:28:51) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
/home/miranda9/miniconda3/envs/synthesis/lib/python3.9/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 101: invalid device ordinal (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448238472/work/c10/cuda/CUDAFunctions.cpp:115.)
  return torch._C._cuda_getDeviceCount() > 0
False

As requested by comment:

# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

another vm:

$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 SCSI storage controller: XenSource, Inc. Xen Platform Device (rev 01)
00:05.0 System peripheral: XenSource, Inc. Citrix XenServer PCI Device for Windows Update (rev 01)
00:06.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

Resources I've search for help:

1 Answers1

0

A virtual machine emulates a graphic card, so it should be transparent for the guest system which native card you have on your host system. VMs are for "sharing" resources - as opposed to a real system that has access to its hardware directly. So it will not make sense to install Nvidia drivers on a host system. You can check this out by checking your current drivers in your VM:

inxi -G

(executed in a terminal) will show you a VM/oracle driver, not your native card.

Getting a hi performance graphic output may be reached with tweaks and tricks, but VMs are not meant for work like this....

kanehekili
  • 7,426