1

System:

  • Ryzen 5, no integrated graphics
  • B450 Tomahawk Max motherboard
  • ADATA SX8100 512 GB SSD
  • Nvidia GeForce 1660 main GPU
  • Dual boot Ubuntu 20.04 and Windows 10
  • UEFI firmware
  • No overclocking or other tweaks

I have had occasional problems in the past where the system would enter a kernel panic on boot, complaining first that initramfs decoding failed followed by being unable to mount root. The recovery mode option for the same kernel version would also panic, although with far more messages displayed.

I would usually deal with this by selecting an older kernel, which would boot fine, and then run Boot-Repair. I would then be good for a random number of boots until it all started over again.

I was never able find the cause and just dealt with the occasional inconvenience, however now none of my kernels boot. All I can do is boot from a live USB. I updated the GRUB config from inside a chroot, so now my Windows menu option is also gone.

The recovery mode messages ask me to specify my root partition with the root= boot option, and then says here are the available partitions followed by a kernel panic message. It seems that it is not detecting any partitions at all. This seems confirmed by the message that it can't mount root fs on unknown-block(0,0) indicating it can't identify what block device to use.

I've checked that the root UUID shown in the boot messages matches the UUID of my actual boot partition. I have not made any partition table modifications recently.

I've tried removing and re-seating the SSD.

How do I troubleshoot this? How do I get the kernel to detect my SSD?

Normal boot error messages:

enter image description here

Recovery mode boot messages

enter image description here

Per comments, I found the SMART status of the SSD.

Results of sudo smartctl :

kubuntu@kubuntu:~$ sudo smartctl -a /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Number: ADATA SX8100NP Serial Number: 2J4620042048 Firmware Version: VB411D43 PCI Vendor/Subsystem ID: 0x10ec IEEE OUI Identifier: 0x00e04c Controller ID: 1 Number of Namespaces: 1 Namespace 1 Size/Capacity: 512,110,190,592 [512 GB] Namespace 1 Formatted LBA Size: 512 Local Time is: Thu Dec 3 00:57:15 2020 UTC Firmware Updates (0x0e): 7 Slots Optional Admin Commands (0x0007): Security Format Frmw_DL Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat Maximum Data Transfer Size: 64 Pages Warning Comp. Temp. Threshold: 118 Celsius Critical Comp. Temp. Threshold: 150 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 8.00W - - 0 0 0 0 0 0 1 + 4.00W - - 1 1 1 1 0 0 2 + 3.00W - - 2 2 2 2 0 0 3 - 0.0128W - - 3 3 3 3 4000 8000 4 - 0.0080W - - 4 4 4 4 8000 30000

Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 31 Celsius Available Spare: 100% Available Spare Threshold: 32% Percentage Used: 0% Data Units Read: 11,670,921 [5.97 TB] Data Units Written: 7,734,266 [3.95 TB] Host Read Commands: 0 Host Write Commands: 0 Controller Busy Time: 0 Power Cycles: 451 Power On Hours: 3,897 Unsafe Shutdowns: 319 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, max 8 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 5490475142593210059 45613 0xa607 0x2024 0x35f8 5071432998301508804 1351944024 0xc5 1 10305710759900180890 9804 0x6c00 0xa1d6 0xcb61 27252774468141376 380130432 0xc3 2 11549487431983370324 16455 0x58c9 0xd23e 0x8147 6957061290267430970 3258320200 0xf5 3 7018321358667646096 37313 0x0e1f 0x8670 0x6242 459368911713436868 1166902044 0x10 4 11390238159922049047 38421 0xd002 0x1890 0x7d29 17438238972143084540 884054193 0x01 5 156936697365045345 26140 0x5041 0xac10 0x4265 11916595043416224210 405107254 0xd4 6 6790662844906997140 16528 0x5fc1 0x2ed1 0x77c 5801270468783952621 39946248 0xb0 7 3460708732253516421 2072 0xa101 0x610c 0xc852 13889879911473169861 2147786536 0x68

Re: motherboard BIOS version. I updated my BIOS both soon after building my PC to version 7C02v36 (dated 04/24/2020) and before asking this question to version 7C02v39 (dated 11/30/2020). It had no effect. The next most recent BIOS listed is dated 12/10/2020, but it is a beta version so I'm uncertain if trying it is a good idea.

FWIW, my Ubuntu boot partition is 30GB, and has 3GB free.

enter image description here

GRUB can see my boot partition, as shown in this screenshot. Immediately after snapping this picture, I typed "normal" to return to the GRUB menu. I booted with debug messages enabled, and it complained that it couldn't mount that exact partition.

enter image description here

karel
  • 122,292
  • 133
  • 301
  • 332
rothloup
  • 233

1 Answers1

1

Boot from a USB stick with a live system on it; it doesn't matter much what Ubuntu version or flavour it is. Then try investigating and maybe even mounting that filesystem manually from a shell in that live system. The internal disk might be /dev/sdc now.

You can investigate partitions with any of

sudo parted --list

sudo fdisk -l

sudo blkid

Once you identified which partition your root filesystem is, you can try to run fsck -f on it for a filesystem check.

Try to mount it; I'd start with a read-only mount:

sudo mount -r /dev/sdc42 /mnt

(/dev/sdc42 being the device you just identified as your root filesystem)

then check /mnt/boot for available kernels and if there is a matching initrd* (the initial RAM disk containing kernel modules).


After reading some more comments above, it appears to me that the protective MBR might be a problem. Basically, it attempts to mirror the GPT partition table to make older tools believe they are seeing an old-style PC ("MS-DOS") partition table. That's alright as long as those older tools never attempt to modify any partitions; if they do, however, the protective MBR (which is what they will change) and the GPT (which contains the true information) may start to mismatch.

The result can be that some OS (not Linux, I am pretty sure) writes to disk blocks outside the current partitions and filesystems. If you experience problems after Windows gaming that would be a hint into that general direction.

HuHa
  • 3,525