2

My desktop is running Ubuntu 20.04. I've been noticing a lot of odd behavior over the last few months (at least) and I'm trying to figure out how to troubleshoot it.

The system has 32 GB of ram, AMD Ryzen 5900X CPU, MSI MAG B550 Tomahawk MD, and a couple SSDs attached.

The symptoms are:

  • Browser tabs crash all the time. Every 15 minutes or so, one of my open tabs (of which I typically have ~20-30) will crash (with "Aw Snap!" in Chrome, and similar messages in Firefox and Chromium).
  • Slack will crash intermittently. About 1-2 times per day it just hangs for about a minute and then dies.
  • My virtual machines will corrupt. I typically have 1-3 open at a time, and about once a month, my VM will start complaining about having a read only file system. I'll reboot and it boots into initramfs, and running fsck across the disk typically fixes it.

My gut is that I could have some failing RAM, but I don't know how to troubleshoot that. Are there logs I could look at that would help figure out why Chrome is crashing all the time?

Thanks in advance!

Edited to add at @heynnema request:

david@jawad:~$ ls -lah /var/crash/
total 240M
drwxrwsrwt  2 root     whoopsie 4.0K Mar 22 10:43 .
drwxr-xr-x 15 root     root     4.0K Aug 31  2021 ..
-rw-r-----  1 david     whoopsie  35M Mar 22 01:13 _opt_google_chrome_chrome.1000.crash
-rw-r-----  1 david     whoopsie  27M Mar 22 10:43 _usr_bin_python3.8.1000.crash
-rw-r-----  1 david     whoopsie  60M Mar 15 15:51 _usr_lib_insync_PySide2_Qt_libexec_QtWebEngineProcess.1000.crash
-rw-r--r--  1 david     whoopsie    0 Mar 15 15:51 _usr_lib_insync_PySide2_Qt_libexec_QtWebEngineProcess.1000.upload
-rw-------  1 whoopsie whoopsie   37 Mar 15 15:51 _usr_lib_insync_PySide2_Qt_libexec_QtWebEngineProcess.1000.uploaded
-rw-r-----  1 david     whoopsie  98M Mar 22 07:28 _usr_lib_slack_slack.1000.crash
-rw-r-----  1 david     whoopsie  22M Mar 17 09:45 _usr_share_typora_Typora.1000.crash

Will get memtest as well.

Next edit as reuqested:

# lshw -C memory
  *-firmware                
       description: BIOS
       vendor: American Megatrends International, LLC.
       physical id: 0
       version: A.60
       date: 05/12/2021
       size: 64KiB
       capacity: 32MiB
       capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppynec int13floppytoshiba int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb biosbootspecification uefi
  *-memory
       description: System Memory
       physical id: 10
       slot: System board or motherboard
       size: 32GiB
     *-bank:0
          description: 2667 MHz (0.4 ns) [empty]
          product: Unknown
          vendor: Unknown
          physical id: 0
          serial: Unknown
          slot: DIMM 0
          clock: 2667MHz (0.4ns)
     *-bank:1
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
          product: F4-3200C16-16GVK
          vendor: Unknown
          physical id: 1
          serial: 00000000
          slot: DIMM 1
          size: 16GiB
          width: 64 bits
          clock: 2667MHz (0.4ns)
     *-bank:2
          description: 2667 MHz (0.4 ns) [empty]
          product: Unknown
          vendor: Unknown
          physical id: 2
          serial: Unknown
          slot: DIMM 0
          clock: 2667MHz (0.4ns)
     *-bank:3
          description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
          product: F4-3200C16-16GVK
          vendor: Unknown
          physical id: 3
          serial: 00000000
          slot: DIMM 1
          size: 16GiB
          width: 64 bits
          clock: 2667MHz (0.4ns)
  *-cache:0
       description: L1 cache
       physical id: 13
       slot: L1 - Cache
       size: 768KiB
       capacity: 768KiB
       clock: 1GHz (1.0ns)
       capabilities: pipeline-burst internal write-back unified
       configuration: level=1
  *-cache:1
       description: L2 cache
       physical id: 14
       slot: L2 - Cache
       size: 6MiB
       capacity: 6MiB
       clock: 1GHz (1.0ns)
       capabilities: pipeline-burst internal write-back unified
       configuration: level=2
  *-cache:2
       description: L3 cache
       physical id: 15
       slot: L3 - Cache
       size: 64MiB
       capacity: 64MiB
       clock: 1GHz (1.0ns)
       capabilities: pipeline-burst internal write-back unified
       configuration: level=3
DCHeel
  • 33

2 Answers2

3

Lots of crash logs in /var/crash.

Ryzen processors are very fussy about RAM. Go to https://memtest86.com and download/run their free memtest to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take a few hours to complete.

Update #1:

memtest failed. Show me sudo lshw -C memory.

Update #2:

You have two 16G DIMMs of memory. First, power off the computer, and remove, then reinsert each DIMM, and re-run memtest. If it passes, then you've probably fixed the problem.

If it fails, remove one 16G DIMM and re-run memtest. If it fails, that DIMM may be defective.

In either case, remove that DIMM and reinsert the other DIMM and re-run memtest.

It's important to write down which DIMM passed, and which DIMM failed. Report back.

Update #3:

memtest fails on individual DIMMs. Suspect RAM compatibility issue, but first we need to update the BIOS and retest with memtest. Get the BIOS update here.

Note: Confirm that I have the correct web page for your motherboard (MSI MAG B550 Tomahawk MD).

Note: Have good backups before updating the BIOS.

Update #4:

Your memory DIMMs (F4-3200C16-16GVK) don't appear on the memory compatibility lists shown here.

Update #5:

Review pages 13 and 15 of the User Manual here and confirm that DIMM slot A2 is filled first, and B2 next, with exact same spec DIMMs.

Update #6:

See https://www.crucial.com/ for correct spec DIMMs. See https://www.crucial.com/compatible-upgrade-for/msi-%28micro-star%29/mag-b550-tomahawk

Update #7:

Confirm that CPU and RAM are NOT overclocked. If they are, set them back to default clocks, and retest with memtest.

Update #8:

Replaced the RAM. Everything is working fine now.

heynnema
  • 73,649
2

I had the same problem, I try this and it works for me:

Ubuntu 22.04 LTS’s introduction of systemd-oomd, a user-space out of memory killer that’s designed to “take corrective action before an OOM occurs in the kernel space’. When it detects that memory pressure is getting a bit too stressed, it intervenes to ensure the system copes, and (most) things stay running. I hope it helps you.

Most systemd services can be managed via the systemctl utility. In this case, we want to disable the systemd-oomd service. This can be done with:

$ systemctl disable --now systemd-oomd

You should see something like (depending on your OS):

$ systemctl disable --now systemd-oomd
Removed /etc/systemd/system/multi-user.target.wants/systemd-oomd.service.
Removed /etc/systemd/system/dbus-org.freedesktop.oom1.service.

You can then verify that the service is disabled, with:

$ systemctl is-enabled systemd-oomd

And you should then see:

$ systemctl is-enabled systemd-oomd
disabled

It is possible, however, that other services might attempt to restart the systemd-oomd service. To prevent this, you can 'mask' the service. For example:

$ systemctl mask systemd-oomd
Created symlink /etc/systemd/system/systemd-oomd.service → /dev/null.

And then systemctl is-enabled should now report:

$ systemctl is-enabled systemd-oomd
masked

See man systemctl for more details; in particular, note the caveats regarding masking of systemd services.

How do I disable the systemd OOM process killer in Ubuntu 22.04?