5

Summary of Issue

After enabling the LUKS / dm-crypt full-disk-encryption option available through the Ubuntu installer, disk I/O performance is absolutely abysmal. Writing to the disk stalls / freezes the system. Data read from the disk appears to be corrupted.

If I don't use LUKS / dm-crypt then I don't have any problems at all. Everything is perfectly stable and performant. I understand that encryption has a performance hit. I expect lower performance, not minutes-long system freezes and data corruption.

I've never had so many issues with a completely clean Ubuntu installation before. I'm fine with being wrong about something. I just want my stuff to work!

Both systems listed below are affected by this issue. All experimentation happened on the Ryzen system. The i5 has been doing basically nothing for 2 years so I just never noticed the problem until now.

System #1 (running mostly idle for about 2 years)

  • Ubuntu Server 20.04.3 LTS
  • Intel i5-3570
  • 8GB RAM, non-ECC
  • Kingston 120GB A400 SATA SSD
  • No errors reported by Memtest86+ or Prime95

System #2 (new system, where problem was first discovered)

  • Ubuntu Server 22.04 LTS
  • AMD Ryzen 5 5600
  • 32GB RAM, ECC
  • Kingston 120GB A400 SATA SSD
  • No errors reported by Memtest86+ or Prime95

Steps to Reproduce

  • Install Ubuntu 20.04 LTS or 22.04 LTS
  • During installation, when setting up the disk, choose the following options
  • Use an entire disk
  • Set up this disk as an LVM group
  • Encrypt the LVM group with LUKS
  • Expand the partition containing / so it takes up all available free space in the LVM group
  • After installation and boot, use SSH / samba / USB / whatever to transfer a large file to the OS disk

Expectation

  • Write big files (greater than ~6GB) to the disk without the system freezing
  • Read files back from the disk and have them not be corrupted

Reality

All of these issues were found with the Ryzen system. I tested heavy I/O load on the i5 system once and was able to reproduce the issue. I'm not brave enough to push it further, lest I corrupt the OS disk and have to rebuild it.

  • Writing large files freezes the system to the point where only console echo works. commands don't run. even ls won't return anything. SSH transfers stall, time out, and fail.
  • iotop says at least one kcryptd worker thread hits 99% IO load and then hangs there for several minutes (feels like 2-3 minutes)
  • Large files read back from the disk appear to be corrupted. I moved a VM image over and it wasn't able to run for more than a few seconds without crashing out due to internal file system damage. After a few reboots apt started complaining about broken packages. The network connection stopped coming up. Eventually the system threw a kernel panic and I gave up.
  • Oddly enough, reboot doesn't actually reboot. The system will hang with a black screen after shutting down the OS. Lights and fans stay on. The chassis reset button doesn't work in this state. I have to pull the power cable out of the wall to get things going again.

Please note that none of these issues occur when the OS is installed without LUKS / dm-crypt underneath. This includes the odd issue with the hung reboots.

Also note that I tried running Windows 10 + BitLocker on the Ryzen system and it had zero issues.

Additional Info

I did all of this on the new Ryzen system with Ubuntu 22.04 LTS.

  • I tried setting cryptsetup --allow-discards --perf-no_read_workqueue --perf-no_write_workqueue --persistent refresh thinking that this was caused by some weirdness with the cheap SSD. It helped write performance, but reads still appear corrupted.
  • I tried a full clean reinstall without any extra applications. Just the base OS and iotop. No updates. The problem persists.
  • I swapped the Kingston SSD for a known-good 7200RPM spinning hard disk. SATA 2.5" 320GB non-SMR. Full clean reinstall. The problem persists.
  • I swapped the Kingston SSD for an known-good NVMe drive, Samsung SSD 970 EVO Plus. Full clean reinstall. The problem persists.
  • I replaced the SATA cables, even though everything works fine without encryption. The problem persists.
  • All drives involved in this mess have passed badblocks and SMART tests.

At this point I'm seriously considering moving back to Windows 10 + BitLocker because I don't know what else to do.

Links

1 Answers1

3

Turns out there were two issues working together to make my life difficult.

First Issue: Ubuntu / Fedora / Linux in General

Ubuntu ships with dm-crypt worker I/O queues enabled. Apparently these queues aren't written very well. The kernel waits until they are full or near-full before trying to dump them to the disk, and with multiple queues all fighting for disk access, the disk dies under the load and the system locks up.

But "Reeeeeee!", you say, "That's not what's happening the queues are perfect and nothing could be wr-" don't care, I'm not a kernel developer, all I know is what I see in iotop, and the fact that system locks up hard when I'm writing lots of stuff to the disk. This doesn't happen when the system is running without encryption. The dm-crypt queues are broken. End of story.

If you disagree with me then you can go read what Cloudflare had to say about it. https://blog.cloudflare.com/speeding-up-linux-disk-encryption/

Anyway, disabling the queues "fixed" the problem. You can see command I used to do this in the original post above.

Second Issue: VirtualBox

This ticket: https://www.virtualbox.org/ticket/10031?cversion=0&cnum_hist=14

... has been open for over 10 years now. From this info my guess is that VBox is not very tolerant of I/O latency and eventually gives up if access to the host's storage takes too long. The VBox emulator / hypervisor / whatever it is turns back around to the guest VM and says "sorry, I can't read or write the disk".

How does a VM deal with a virtualized I/O layer that acts like a defective hard disk? It doesn't. It immediately explodes into atoms like a super hero on the wrong side of a Thanos snap.

I "fixed" this by dumping VirtualBox and switching to KVM. I now use virt-manager over SSH with X-forwarder to do my stuff. KVM appears to be much more tolerant of slow host I/O, making it perfect match for LUKS.

Switching to KVM

VirtualBox .vdi files are easily converted to .qcow2 format. There's an endless number of tutorials about how to do this.

virt-manager UI works great over SSH with x-forwarding enabled.

USB passthrough to the guest VM works fine too. You might have to edit udev rules if your permissions aren't set up right. Again, there is tons of info about this that is easily findable through Google.

If you're looking to make the same jump from VirtualBox but you want your bridged network adapters on the guest VMs to be connected directly to the network (just like I did) then you'll need to change your netplan settings accordingly. Here's an example from my own config file:

network:
    version: 2
    renderer: networkd
    ethernets:
        eno1:
            match:
                name: eno1
            dhcp4: no
            optional: yes
    bridges:
        br0:
            macaddress: 74:46:a0:b4:39:b9
            dhcp4: yes
            interfaces:
                - eno1

Set the bridge MAC to your physical NIC's MAC if you don't want to mess up the DHCP static mappings.