We're finding the Ubuntu Desktop 22.04 boot process to be especially fragile, consistently hanging during configuration. We haven't yet put our finger on the modality and thought we'd see if anyone has any idea what's going on as we continue debugging. Perhaps some common knowledge exists that hasn't yet made it to our corner of the woods.
The Ubuntu 22.04 boot process seems particularly sensitive to denial of Zeroconf traffic via iptables DROP rules or the absence of the avahi-daemon package, especially when physical network connections exist. Ubuntu 20.04 LTS exhibits none of this behavior. The system also can't make it through its first
# apt update && apt upgrade
without that process hanging, and the subsequent reboot hanging. So, the cause seems to have been isolated but not the modality of the boot hang.
The boot hangs present variously, most commonly as systemd ordering cycles or failed partition mounts, which may be related. It isn't clear where this traces.
- It could be unit ordering, although nothing here is obvious.
- Perhaps unit ordering is affected by the absence of zeroconf network traffic (i.e., by preventing targets from being reached) but no obvious reason exists why local processes would not communicate via d-bus, but if they did communicate via tcp-ip why they would not implement more secure name resolution than broadcast traffic. Fundamentally, it isn't at all clear why such traffic exists, its source, or destination.
- It also could be unmet package dependencies after the purges: Error messages vaguely suggest snapd although it may simply be another victim. Reverse dependency analysis may suggest gnome (vanilla-gnome-desktop or gnome-shell)(perhaps via libapache2-mod-dnssd), systemtap, telepathy-salut, or something else but how these fit together or what the specific modality of the hang is, isn't clear, to me (a lay sysadmin, at best), at least.
We're debugging our customary install scripts, which purge all sorts of irrelevant or unnecessary dreck such as avahi-daemon, network-manager, ufw, netplan.io, ModemManager, fprintd, ppp, linux-ppp, etc., configure networking via systemd-networkd, and enable iptables to run at boot before the network is up. The debug process began with no purges, then reboots after package-wise purges.
The iptables rules drop obviously bad packets including all broadcast and zeroconf traffic. The kernel is configured for packet forwarding, so these DROP rules are in the mangle table's PREROUTING chain so they are scrubbed before reaching the filter table's INPUT and FORWARD chains. Presumably, this also scrubs traffic to the loopback interface. INPUT, OUTPUT, and FORWARD rules ACCEPT appropriate traffic before a default DROP rule in each case. Debugging here has involved commenting out the zeroconf DROP rules, which restores boot in some cases but for unknown reasons.
Curiously, none of this configuration has any effect on the boot process when the machine has its Ethernet cables unplugged.
Debugging continues but any constructive thoughts about what may be at work and how to achieve a stable installation would be especially appreciated. I'm out of business until this is resolved.