3

I'm running a headless Ubuntu server as a media server. As I only rarely need access to the data, I'm suspending the system when it's idle. The command systemctl suspend is used to suspend the system.

This worked fairly well with Ubuntu 20.04 LTS but since upgrading to 22.04 LTS suspense is not working at all. Note: Before upgrade the problem I'm describing here happened rarely as well.

The problem I'm facing: Once the command is run, the screen turns black (if one is attached) and/or the ssh sessions terminate. The HDDs seem to power down, but the system fans keep rotating. The server then becomes unresponsive and does not react to WOL commands or even the power button. I cannot wake up the system anymore. As a last resort, I have to press the power button for 5 seconds to cut the power and restart the server.

After digging through a lot of documentation I now don't know how to further debug the issue. Is there any debugging procedure I can follow? Any logs I can check for errors? Anyone who experienced similar issues?

Some more details:

  • The system is running Ubuntu 22.04 LTS with the kernel version 5.15.0-76-generic
  • S3 suspension/STR is activated in the BIOS
  • After countless tries, one time it actually worked and suspended correctly, so it seems to be possible, but is just up to luck at the moment it seems
  • Changing to an older kernel (5.4.0-153-generic) as suggested in some threads and done as described here prevented the system from booting so I had to revert

Edit 1: As requested I posted the output of journalctl --grep='suspend|sleep' --no-pager --since="-1hour" here: https://pastebin.com/izdBMWmv

Edit 2: More output of the journalctl command after removing the autosuspend systemd service: https://pastebin.com/T5fwcfpK

Edit 3: Output of the command for s in "0000:03:00" "0000:05:00"; do lspci -nnk -s "$s"; done: https://pastebin.com/eEUtBfcM

Fgop
  • 131

1 Answers1

3

How to investigate?

You need to see what's going on behind the scenes in your system while it's suspending/sleeping ... You can start by inspecting related system messages using journalctl like so:

journalctl --grep='suspend|sleep' --no-pager --since="-1week"

You can change --since="-1week" which will show messages from the past seven days to show only the past day --since="-1day" or the past hour --since="-1hour" ... etc. and the --grep='suspend|sleep'(case insensitive if pattern(s) is(are) all lower-case) will only show messages that have suspend or sleep in them while the --no-pager will disable the pager behavior and print the output at once allowing you to easily copy the whole output in a single action.

How to expand your investigation?

journalctl search/output can be expanded by adding more related search words to the --grep= option separated by the |(or) regex operator like e.g.:

journalctl --grep='suspend|sleep|acpi' --no-pager --since="-1hour"

also priority of the messages can be specified like for example to print messages of priority 4(warning) and more critical "emerg" (0), "alert" (1), "crit" (2), "err" (3), "warning" (4)", you can use it like so:

journalctl --priority=4 --no-pager --since="-1hour"

and so on ...

What do your logs reveal?

Issue #1

In your logs, /opt/autosuspend/bin/autosuspend appears to be a Python script in which from autosuspend import main is trying to import a module which the system Python reports doesn't exist(not installed) ModuleNotFoundError: No module named 'autosuspend' ... The relevant lines are:

Jul 16 23:30:03 homse1 autosuspend[1292]:   File "/opt/autosuspend/bin/autosuspend", line 5, in <module>
Jul 16 23:30:03 homse1 autosuspend[1292]:     from autosuspend import main
Jul 16 23:30:03 homse1 autosuspend[1292]: ModuleNotFoundError: No module named 'autosuspend'

This could happen for example after upgrading Ubuntu release that comes with a newer system Python version ... Please see for example python3.8 and pip after upgrading to 22.04.2

Therefore, as a fix, you might want to first try:

python3 -m pip install -U autosuspend

Or with sudo if globally installing(depends on how your script is run).

Issue #2

In your logs, two PCI controllers/sockets appear to not fully complying with system suspend/sleep ... The relevant lines are:

Jul 17 22:34:48 homse1 kernel: pci 0000:03:00.0: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:03:00.1: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:05:00.0: async suspend disabled to avoid multi-function power-on ordering issue
Jul 17 22:34:48 homse1 kernel: pci 0000:05:00.1: async suspend disabled to avoid multi-function power-on ordering issue

Therefore, you need to further investigate what these are and what kernel modules/drivers are in use for them by running the command lspci -nnk -s on them one by one or all at once like so:

for s in "0000:03:00" "0000:05:00"; do lspci -nnk -s "$s"; done

and your output:

03:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: ahci
    Kernel modules: ahci
03:00.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: pata_jmicron
    Kernel modules: pata_jmicron, pata_acpi
05:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 02)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: ahci
    Kernel modules: ahci
05:00.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 02)
    Subsystem: Gigabyte Technology Co., Ltd Motherboard [1458:b000]
    Kernel driver in use: pata_jmicron
    Kernel modules: pata_jmicron, pata_acpi

reveals that those are JMicron Technology Corp. JMB363 SATA/IDE Controllers which have been reported with power management issues under some kernels for example here and here and the issue has been also isolated to the pata_acpi(in use on your system) kernel module for example here ... Therefore, that might be related to preventing your system from sleeping and you might want to read linked resources and others on this matter and then, troubleshoot and see what might work for you by experimenting with e.g. blacklisting the pata_acpi kernel module and see if that helps.

Raffa
  • 34,963