Repeating problem with root filesystem on SSD becoming read-only randomly

Question

I replaced system SSD drive 1TB with a larger one 2TB and cloned the content using CloneZilla utility. Then in the OS the drive still appeared 1TB but I was able to extend it to 2TB. All the data seemed fine.

After some time the filesystem became read-only. Reboot and fsck did help, but only for few days. It keeps happening since then. Could the new SSD drive have been faulty? I tried updating Ubuntu from 18.04 to 20.04 but to no avail. Filesystem is ext4.

EDIT: Smartctl report:

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      S6P1NS0T501522T
Firmware Version:                   4B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2 000 398 934 016 [2,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2 000 398 934 016 [2,00 TB]
Namespace 1 Utilization:            726 404 530 176 [726 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5521405351
Local Time is:                      Fri Sep  2 14:42:51 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.59W       -        -    0  0  0  0        0       0
 1 +     7.59W       -        -    1  1  1  1        0     200
 2 +     7.59W       -        -    2  2  2  2        0    1000
 3 -   0.0500W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1 046 751 [535 GB]
Data Units Written:                 11 045 053 [5,65 TB]
Host Read Commands:                 21 511 754
Host Write Commands:                122 266 698
Controller Busy Time:               632
Power Cycles:                       20
Power On Hours:                     258
Unsafe Shutdowns:                   14
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               41 Celsius
Temperature Sensor 2:               46 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

syslog tail:

Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.938574835+02:00" level=info msg="cleaning up dead shim"
Sep  1 07:35:45 dockerd[1609]: time="2022-09-01T07:35:45.938532925+02:00" level=info msg="ignoring event" container=c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.954480844+02:00" level=warning msg="cleanup warnings time=\"2022-09-01T07:35:45+02:00\" level=info msg=\"starting signal loop\" namespace=moby pid=3411558 runtime=io.containerd.runc.v2\n"
Sep  1 07:35:45 kernel: [598279.313677] veth0e65189: renamed from eth0
Sep  1 07:35:46 kernel: [598279.339095] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 systemd-udevd[3408622]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.0671] manager: (veth0e65189): new Veth device (/org/freedesktop/NetworkManager/Devices/82537)
Sep  1 07:35:46 avahi-daemon[517479]: Interface veth77e9014.IPv6 no longer relevant for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Leaving mDNS multicast group on interface veth77e9014.IPv6 with address fe80::b82c:fff:fe77:d9b4.
Sep  1 07:35:46 kernel: [598279.397005] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 kernel: [598279.400491] device veth77e9014 left promiscuous mode
Sep  1 07:35:46 kernel: [598279.400494] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 avahi-daemon[517479]: Withdrawing address record for fe80::b82c:fff:fe77:d9b4 on veth77e9014.
Sep  1 07:35:46 systemd-udevd[3408622]: veth0e65189: Failed to get link config: No such device
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.1106] device (veth77e9014): released from master device br-9972a812410e
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 systemd[67738]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 avahi-daemon[517479]: Joining mDNS multicast group on interface veth1b1eea5.IPv6 with address fe80::d05a:41ff:fe71:6d0f.
Sep  1 07:35:46 avahi-daemon[517479]: New relevant interface veth1b1eea5.IPv6 for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Registering new address record for fe80::d05a:41ff:fe71:6d0f on veth1b1eea5.*.
Sep  1 07:35:46 kernel: [598279.963079] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.963138] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.974695] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.063961] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.114831] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.182623] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.241481] EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0

EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0 seems notable

lsblk -f output:

nvme0n1
├─nvme0n1p1 vfat                     F062-0FA7 505,8M     1% /boot/efi
└─nvme0n1p2 ext4                     83f2e983-979f-4303-a7f9-837b7a8d65f0    1,1T    35% /
nvme1n1     ext4     filesystem_home 7af8bdbe-5605-4957-af95-69a790a8f67a 1009,1G    40% /home

preator · Accepted Answer · 2023-01-13T09:22:31.513

If someone stumbles upon similar issuse, here is what I learned. After having problems with the drive, to facilitate drive replacement I moved docker volumes for one high traffic application (sentry) to the other drive. The apllication itself (docker-compose) was on the other - working - drive.

No problems since. I suspect that this is not a coincidence but indeed application and docker volume resting on different physical drives created environment for a problem. Also no other steps were taken (no updates etc.) because it was just a prep for replacing the drive completely on the next failure.

EDIT: It happened again. Also there were two physical 2TB SSD drives. One as system and one as /home. First it happened for the system one, I replaced it with similar specs SSD drive from WD. Then these locks started occuring on the second Samsung drive mounted to /home. So I replaced both. Firmware was the newest version, everything updated. Seems like either bad batch or some common firmware Ubuntu issue.

Repeating problem with root filesystem on SSD becoming read-only randomly

1 Answers1