2

Background: I've been running Ubuntu for years (started at 16.04, now at 20.04), and been constantly fighting with NVIDA drivers which I need as I use CUDA. As recently as yesterday, my NVIDIA 460 drivers were working fine, and an apt upgrade broke them again: Ubuntu 20.4 update broke my Nvidia 460 driver config


What I want to achieve:

  • Create a restore point of a kernel and its modules (eg nvidia drivers) I'm happy with
  • Whenever NVIDIA drivers break (or something else breaks badly), restore it

What I already have:

  • GRUB which seems to allow choosing specific kernels to boot from

(screenshots for illustration, not reflecting latest version) enter image description here enter image description here


Questions:

  • Restore point containing kernel+modules: is it possible to create (if so how), or am I simply misunderstanding how kernels & modules are managed on linux (ie a kernel wouldn't include the nvidia drivers) ?

  • Restoring from grub: It seems /etc/grub.d/40_custom is the file I want to modify as it's designed specifically for custom menu entries. Do you confirm this is the intended way of booting custom kernels or should I be looking at another file?

/etc/grub.d/40_custom:

#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.

2 Answers2

4

The package for the linux kernel is separate from the nvidia drivers. But each kernel version has its corresponding drivers. If you had the nvidia drivers installed for, say, 4.10.0-28, then booting that kernel should use the corresponding drivers.

Other modules may be provided by different packages, e.g., linux-modules-5.8.0-45-generic, linux-modules-extra-5.8.0-45-generic.

So you should already have your "restore point"s, and you can use them via grub, as you show. As for the custom entries in grub, you don't need that for older kernel versions, those are added automatically when updating. You might need that if you want other customization, though.

It is not clear if that is not enough for you, or if that did not work as suitable restore points. Giving a specific example of what did not work for you, if you have any, may help clarifying.

1

For myself I stay on the same nVidia driver and kernel chain:

$ uname -r

4.14.216-0414216-generic

$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Mon Mar 22 11:49:28 2021 Driver Version : 384.130

Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : GeForce GTX 970M Product Brand : GeForce Display Mode : Enabled Display Active : Enabled Persistence Mode : Disabled Accounting Mode : Disabled Accounting Mode Buffer Size : 1920 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-30fab9bc-fe6f-ec05-e8e6-c151a1a96121 Minor Number : 0 VBIOS Version : 84.04.79.00.0A

There is no need to update kernel chain until hardware newer than the kernel chain is installed. The disadvantage is you need to occasionally update the kernel (on the same chain) if your system crashes and you don't know why.

Kernel 4.14 is an LTS (Long Term Support) kernel that is updated for five years with security and bug fixes. Updates are made by the Linux Kernel team and published by the Ubuntu Kernel team.

It's not what I would call a "custom" kernel but many will call it such. When you upgrade within the same kernel chain, say 4.14.188 to 4.14.216 like I did last month grub doesn't automatically make it the new default on the main menu:

grub boot.gif

So after kernel update go into "Advanced options" and select the new kernel. Make sure you configure grub to always boot with the last used option:

Although I have a theme on my grub menu, the menu options stay the same in "regular" grub.

My basic rule of thumb is "If it ain't broke don't fix it". I discovered this after many times of repairing things after software upgrades.

Something to keep in mind when you do upgrade your drivers, kernel and firmware. Do it all at once, test thoroughly and then freeze the setup for many months or even years.