1

I've been chasing this issue for about 6 weeks, ever since I upgraded to Xenial. Initially thought it was random, but found using USB serial adapters provoked the issue. Didn't matter if it was on board USB2 or a PCI-E USB3 add-on card. Would result in the following messages on the text console and/or serial console (which I had enabled to an ob-board serial port):

NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
NMI watchdog: Watchdog detected hard LOCKUP on cpu 5
NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
NMI watchdog: Watchdog detected hard LOCKUP on cpu 8
NMI watchdog: Watchdog detected hard LOCKUP on cpu 11

etc.

The machine has 16 cores, and all would lockup in quick succession, requiring a reset.

I was running latest kernel (linux-image-4.4.0-72-generic). I tried 4.8, but was affected by the MTU bug (https://bugs.launchpad.net/ubuntu/+source/linux-hwe-edge/+bug/1679823). I tried 4.10, but that has some sort of KVM bug (I'm also running a few VMs on the host).

I tried replacing memory (even though ECC memory), replacing mother board, replacing NICs, all to no avail. I couldn't find anyone else reporting the multiple CPU lockup not tied to a specific userland process, so figured I had bad hardware.

Terry Hardie
  • 51
  • 1
  • 6

1 Answers1

3

I followed the instructions to build my own kernel for 4.8 (https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel) and built linux-image-4.8.0-53-generic. This has fixed my lockups, MTU issues and no KVM panics. Since I wasted 6 weeks troubleshooting this, hopefully someone else finds it useful.

Terry Hardie
  • 51
  • 1
  • 6