0

I replaced system SSD drive 1TB with a larger one 2TB and cloned the content using CloneZilla utility. Then in the OS the drive still appeared 1TB but I was able to extend it to 2TB. All the data seemed fine.

After some time the filesystem became read-only. Reboot and fsck did help, but only for few days. It keeps happening since then. Could the new SSD drive have been faulty? I tried updating Ubuntu from 18.04 to 20.04 but to no avail. Filesystem is ext4.

EDIT: Smartctl report:

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 2TB
Serial Number:                      S6P1NS0T501522T
Firmware Version:                   4B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 2 000 398 934 016 [2,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2 000 398 934 016 [2,00 TB]
Namespace 1 Utilization:            726 404 530 176 [726 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5521405351
Local Time is:                      Fri Sep  2 14:42:51 2022 CEST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.59W - - 0 0 0 0 0 0 1 + 7.59W - - 1 1 1 1 0 200 2 + 7.59W - - 2 2 2 2 0 1000 3 - 0.0500W - - 3 3 3 3 2000 1200 4 - 0.0050W - - 4 4 4 4 500 9500

Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 41 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 1 046 751 [535 GB] Data Units Written: 11 045 053 [5,65 TB] Host Read Commands: 21 511 754 Host Write Commands: 122 266 698 Controller Busy Time: 632 Power Cycles: 20 Power On Hours: 258 Unsafe Shutdowns: 14 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 41 Celsius Temperature Sensor 2: 46 Celsius

Error Information (NVMe Log 0x01, max 64 entries) No Errors Logged

syslog tail:

Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.938574835+02:00" level=info msg="cleaning up dead shim"
Sep  1 07:35:45 dockerd[1609]: time="2022-09-01T07:35:45.938532925+02:00" level=info msg="ignoring event" container=c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep  1 07:35:45 containerd[1489]: time="2022-09-01T07:35:45.954480844+02:00" level=warning msg="cleanup warnings time=\"2022-09-01T07:35:45+02:00\" level=info msg=\"starting signal loop\" namespace=moby pid=3411558 runtime=io.containerd.runc.v2\n"
Sep  1 07:35:45 kernel: [598279.313677] veth0e65189: renamed from eth0
Sep  1 07:35:46 kernel: [598279.339095] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 systemd-udevd[3408622]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.0671] manager: (veth0e65189): new Veth device (/org/freedesktop/NetworkManager/Devices/82537)
Sep  1 07:35:46 avahi-daemon[517479]: Interface veth77e9014.IPv6 no longer relevant for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Leaving mDNS multicast group on interface veth77e9014.IPv6 with address fe80::b82c:fff:fe77:d9b4.
Sep  1 07:35:46 kernel: [598279.397005] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 kernel: [598279.400491] device veth77e9014 left promiscuous mode
Sep  1 07:35:46 kernel: [598279.400494] br-9972a812410e: port 5(veth77e9014) entered disabled state
Sep  1 07:35:46 avahi-daemon[517479]: Withdrawing address record for fe80::b82c:fff:fe77:d9b4 on veth77e9014.
Sep  1 07:35:46 systemd-udevd[3408622]: veth0e65189: Failed to get link config: No such device
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 NetworkManager[1276]: <info>  [1662010546.1106] device (veth77e9014): released from master device br-9972a812410e
Sep  1 07:35:46 gnome-shell[1796]: Removing a network device that was not added
Sep  1 07:35:46 systemd[67738]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[67738]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[361680]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1712]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: run-docker-netns-9bfc9b4bb9d2.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[960393]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-containers-c61acc2424d8b29cde658e65b0a12b21b7dd87a9c406532ee2fc75a68d565ab0-mounts-shm.mount: Succeeded.
Sep  1 07:35:46 systemd[1]: var-lib-docker-overlay2-a8794c75b463c71c93b29c9643accfb4e12fffe422f7060279f3d976db072b25-merged.mount: Succeeded.
Sep  1 07:35:46 avahi-daemon[517479]: Joining mDNS multicast group on interface veth1b1eea5.IPv6 with address fe80::d05a:41ff:fe71:6d0f.
Sep  1 07:35:46 avahi-daemon[517479]: New relevant interface veth1b1eea5.IPv6 for mDNS.
Sep  1 07:35:46 avahi-daemon[517479]: Registering new address record for fe80::d05a:41ff:fe71:6d0f on veth1b1eea5.*.
Sep  1 07:35:46 kernel: [598279.963079] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.963138] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598279.974695] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.063961] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.114831] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.182623] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
Sep  1 07:35:46 kernel: [598280.241481] EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0

EXT4-fs error (device nvme0n1p2): __ext4_find_entry:1551: inode #40395175: comm updatedb.mlocat: checksumming directory block 0 seems notable

lsblk -f output:

nvme0n1
├─nvme0n1p1 vfat                     F062-0FA7 505,8M     1% /boot/efi
└─nvme0n1p2 ext4                     83f2e983-979f-4303-a7f9-837b7a8d65f0    1,1T    35% /
nvme1n1     ext4     filesystem_home 7af8bdbe-5605-4957-af95-69a790a8f67a 1009,1G    40% /home
preator
  • 51
  • 3

1 Answers1

0

If someone stumbles upon similar issuse, here is what I learned. After having problems with the drive, to facilitate drive replacement I moved docker volumes for one high traffic application (sentry) to the other drive. The apllication itself (docker-compose) was on the other - working - drive.

No problems since. I suspect that this is not a coincidence but indeed application and docker volume resting on different physical drives created environment for a problem. Also no other steps were taken (no updates etc.) because it was just a prep for replacing the drive completely on the next failure.

EDIT: It happened again. Also there were two physical 2TB SSD drives. One as system and one as /home. First it happened for the system one, I replaced it with similar specs SSD drive from WD. Then these locks started occuring on the second Samsung drive mounted to /home. So I replaced both. Firmware was the newest version, everything updated. Seems like either bad batch or some common firmware Ubuntu issue.

preator
  • 51
  • 3