9

Running Ubuntu 22.04 on my main laptop. I am using 4TB TEAMGROUP MP34 NVMe as my main drive. The file system is ext4.

Yesterday (Nov 16), while downloading some large files (about 300 files, 600GB total), suddenly my laptop started acting weirdly. Everything became very slow and my system crashed. I was able to repair it with a bootable USB and fsck. However the laptop was still very slow and the NVMe SSD was getting very hot, about 75 degrees Celsius (usually it's less than 35 degrees). The disk was only about 35% full. I run benchmark on the disk and the speeds were inconsistent and very slow. After several minutes of work the disk went to into read-only mode.

Initially, I thought there was some hardware problem. I opened the laptop and cleaned the contacts with isopropyl alcohol. I changed the NVMe with another and the laptop worked normally. I installed back my initial NVMe and the laptop was very slow again. At some point I decided to run sudo fstrim -av, it took about 5-6 minutes (trimmed about 2.9TB) and after that the laptop started working like new. I have been using it without any problems for more than 5 days now. I did some stress tests and benchmarks, everything works normally.

The output of the manual sudo fstrim -av I did on Nov 16:

/boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
/: 2.9 TiB (3138692276224 bytes) trimmed on /dev/nvme0n1p2

It looks like fstrim.service was working fine:

cat /var/log/syslog | grep -a fstrim

Nov 13 01:43:37 dev fstrim[98095]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1 Nov 13 01:43:37 dev fstrim[98095]: /: 2.9 TiB (3140636598272 bytes) trimmed on /dev/nvme0n1p2 Nov 13 01:43:37 dev systemd[1]: fstrim.service: Deactivated successfully.

The last TRIM looks more normal:

cat /var/log/syslog | grep -a fstrim
Nov 20 01:26:54 dev fstrim[109477]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Nov 20 01:26:54 dev fstrim[109477]: /: 31.5 GiB (33783455744 bytes) trimmed on /dev/nvme0n1p2
Nov 20 01:26:54 dev systemd[1]: fstrim.service: Deactivated successfully.

The NVMe is pretty new and in good condition:

sudo smartctl -a /dev/nvme0

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-36-generic] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION === Model Number: TEAM TM8FP4004T Serial Number: xxxxxxxxxxxxxxxxxxxxx Firmware Version: VB421D65 PCI Vendor/Subsystem ID: 0x10ec IEEE OUI Identifier: 0x00e04c Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 4,096,805,658,624 [4.09 TB] Namespace 1 Formatted LBA Size: 512 Local Time is: Fri Nov 17 12:57:17 2023 EET Firmware Updates (0x02): 1 Slot Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat Log Page Attributes (0x02): Cmd_Eff_Lg Maximum Data Transfer Size: 32 Pages Warning Comp. Temp. Threshold: 100 Celsius Critical Comp. Temp. Threshold: 110 Celsius

Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 8.00W - - 0 0 0 0 230000 50000 1 + 4.00W - - 1 1 1 1 4000 50000 2 + 3.00W - - 2 2 2 2 4000 250000 3 - 0.50W - - 3 3 3 3 4000 8000 4 - 0.0090W - - 4 4 4 4 8000 30000

Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0

=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 35 Celsius Available Spare: 100% Available Spare Threshold: 32% Percentage Used: 0% Data Units Read: 4,447,105 [2.27 TB] Data Units Written: 8,885,998 [4.54 TB] Host Read Commands: 48,182,921 Host Write Commands: 112,476,615 Controller Busy Time: 0 Power Cycles: 34 Power On Hours: 2,423 Unsafe Shutdowns: 11 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 8 of 8 entries) No Errors Logged

Output of journalctl | grep "fstrim.*/:":

Jul 03 00:21:43 dev fstrim[27756]: /: 3.6 TiB (4009258434560 bytes) trimmed on /dev/nvme0n1p2
Jul 10 00:54:49 dev fstrim[1244594]: /: 3.6 TiB (4001406066688 bytes) trimmed on /dev/nvme0n1p2
Jul 17 00:32:58 dev fstrim[4040993]: /: 54.6 GiB (58677125120 bytes) trimmed on /dev/nvme0n1p2
Jul 24 00:29:14 dev fstrim[1600660]: /: 138.8 GiB (149000179712 bytes) trimmed on /dev/nvme0n1p2
Jul 31 00:35:13 dev fstrim[620323]: /: 135.8 GiB (145785393152 bytes) trimmed on /dev/nvme0n1p2
Aug 07 00:13:04 dev fstrim[35853]: /: 2.9 TiB (3226885373952 bytes) trimmed on /dev/nvme0n1p2
Aug 14 00:29:27 dev fstrim[125210]: /: 2.9 TiB (3230223196160 bytes) trimmed on /dev/nvme0n1p2
Aug 21 01:32:45 dev fstrim[332311]: /: 56.8 GiB (61013270528 bytes) trimmed on /dev/nvme0n1p2
Aug 28 00:11:05 dev fstrim[586592]: /: 90.3 GiB (96974286848 bytes) trimmed on /dev/nvme0n1p2
Sep 04 01:28:47 dev fstrim[16608]: /: 3 TiB (3257704198144 bytes) trimmed on /dev/nvme0n1p2
Sep 11 00:22:26 dev fstrim[21637]: /: 2.9 TiB (3238865485824 bytes) trimmed on /dev/nvme0n1p2
Sep 18 01:14:48 dev fstrim[126317]: /: 2.9 TiB (3240947859456 bytes) trimmed on /dev/nvme0n1p2
Sep 25 00:22:54 dev fstrim[410142]: /: 36.2 GiB (38895230976 bytes) trimmed on /dev/nvme0n1p2
Oct 02 00:31:31 dev fstrim[90432]: /: 3 TiB (3249296408576 bytes) trimmed on /dev/nvme0n1p2
Oct 09 00:48:51 dev fstrim[319128]: /: 54.2 GiB (58184278016 bytes) trimmed on /dev/nvme0n1p2
Oct 16 01:11:15 dev fstrim[29502]: /: 2.8 TiB (3103039946752 bytes) trimmed on /dev/nvme0n1p2
Oct 23 00:31:40 dev fstrim[85578]: /: 2.9 TiB (3152333541376 bytes) trimmed on /dev/nvme0n1p2
Oct 30 01:16:53 dev fstrim[212523]: /: 2.9 TiB (3140076969984 bytes) trimmed on /dev/nvme0n1p2
Nov 06 01:11:08 dev fstrim[38462]: /: 2.9 TiB (3138336178176 bytes) trimmed on /dev/nvme0n1p2
Nov 13 01:43:37 dev fstrim[98095]: /: 2.9 TiB (3140636598272 bytes) trimmed on /dev/nvme0n1p2
Nov 20 01:26:54 dev fstrim[109477]: /: 31.5 GiB (33783455744 bytes) trimmed on /dev/nvme0n1p2

Although an old question, this is related to the above numbers: Large amount of data trimmed after running fstrim. I don't restart my laptop very often and it's normal for me to have few weeks uptime.

I have been using SSDs for years and this is the first time I am experiencing a problem like this. Also the first time I had to execute fstrim manually. So, I am a bit puzzled. What could have caused this behavior? Is it normal? Is there a way to know if my NVMe SSD needs TRIM?

sotirov
  • 4,379

1 Answers1

5

"How to know if my NVMe SSD needs TRIM"

Since I can't explain the phenomenon you experience, I also can't say for sure what the reason is, and which exact criteria you should monitor.

However, this will more be a collection of pointers you can monitor, and decide for yourself if you want to preemptively take action (do an extra manual trim with sudo fstrim -av) based on those.

So here are my suggestions:

  1. Monitor the output of fstrim.service. If it trims an excessive amount (like over 1 TB), maybe take action.
  2. Monitor how many GB of data you have downloaded since last trim. If this exceeds a threshold of total disk size (25-50%), consider taking action.
  3. Monitor the SSD write speed. If it's less than half the stated value (or under 250 MB/s - not relevant in your case though), take action.

There may be more viable indicators to this list.

Testing fstrim.service performance

I tested on my own machine, and I can now confirm that the fstrim.service for me performs exactly as stated by @sotirov and @FedKad in the comments and in this Q&A.

This is my output of journalctl -t fstrim (lines are shortened):

Oct 23 00:04:55 xb fstrim[662497]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Oct 23 00:04:55 xb fstrim[662497]: /: 442 GiB (474638336000 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c753d>
-- Boot 34c888b0968f458084fa1cf674269326 --
Oct 30 00:04:53 xb fstrim[1303597]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Oct 30 00:04:53 xb fstrim[1303597]: /: 442.1 GiB (474652139520 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>
-- Boot 04117f235c354c1fb3c4f082bae4f563 --
Nov 06 00:16:25 xb fstrim[612946]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 06 00:16:25 xb fstrim[612946]: /: 442 GiB (474547269632 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c753d>
Nov 13 00:19:03 xb fstrim[3960792]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 13 00:19:03 xb fstrim[3960792]: /: 253.8 GiB (272512958464 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>
Nov 20 00:02:50 xb fstrim[2878811]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 20 00:02:50 xb fstrim[2878811]: /: 258.4 GiB (277492928512 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>

It's evident here that:

  1. fstrim.service trims the entire disk after the first boot.
  2. fstrim.service then trims a rather large amount (253.7 GiB and 258.4 GiB) subsequently

Then replicating @sotirov's post, I tried running fstrim manually, which resulted in another large amount:

/: 274.1 GiB (294319964160 bytes) trimmed

And then, when running fstrim manually for the second time, the number is vastly different:

/: 84.3 MiB (88375296 bytes) trimmed

This confirms the behavior of fstrim. Maybe this behavior is buggy, or maybe I just don't understand the huge difference.

What I can tell is that the number of blocks trimmed gets reduced drastically after running fstrim manually. Also, I didn't notice any performance difference whatsoever, so in my case it seemed it didn't really matter.

Technical details:

Example on how to measure the data trimmed by fstrim.service (as per bullet 1):

#!/bin/bash

Set threshold for SSD trim

threshold=500

Get the latest trim value

ssdvalue=$(journalctl -t fstrim | tail -n 1 | awk '{ print $7 }')

If value is smaller than threshold, then OK - else do something

The logic should probably be reworked here, when dealing with Terrabytes of data - probably by using numfmt command or something similar

if [[ "${ssdvalue%.*}" -lt "$threshold" ]] then echo "Everything OK" else echo "Do something (run fstrim)" fi

Example on how to measure SSD write speed (as per bullet 3 - run this script as root):

#!/bin/bash

Set path to SSD disk (to write benchmark file)

ssdpath=/path/to/ssd/

Set threshold for write speed

threshold=1000

Remove file if it exists

[[ -f "$ssdpath/ssdwrite" ]] && rm "$ssdpath/ssdwrite"

Run dd command to test write speed

dd if=/dev/zero of="$ssdpath/ssdwrite" conv=fdatasync bs=1G count=5 status=progress 2> /dev/shm/ssdspeed

Isolate the MB/s value

ssdvalue=$(tail -n 1 /dev/shm/ssdspeed | awk '{ print $10 }')

If value is larger than threshold, then OK - else do something

if [[ "$ssdvalue" -gt "$threshold" ]] then echo "Everything OK" else echo "Do something (run fstrim)" fi

Artur Meinild
  • 31,035