6

Since a few days I get the hard disk health warning that the error log increased from X to Y (see code below). The error log increases after every reboot of the notebook or wake from hibernation. Since this error occurs the hard disk gets automatically scanned during boot.

This message was generated by the smartd daemon running on:

host name: Latitude-5590 DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/nvme0, number of Error Log entries increased from 71 to 74

Device info: PM981 NVMe Samsung 512GB, S/N:S3ZHNY0K908914, FW:EXA73D1Q, 512 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation. The original message about this issue was sent at Mon May 16 12:09:35 2022 CEST Another message will be sent in 24 hours if the problem persists.

Unfortunately the syslog does not contain any details:

sudo grep smartd /var/log/syslog
May 20 20:58:23 Latitude-5590 smartd[686]: Device: /dev/nvme0, number of Error Log entries increased from 74 to 75
May 20 20:58:23 Latitude-5590 smartd[686]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
May 20 20:58:23 Latitude-5590 smartd[686]: Warning via /usr/share/smartmontools/smartd-runner to root produced unexpected output (183 bytes) to STDOUT/STDERR: 
May 20 20:58:23 Latitude-5590 smartd[686]: /etc/smartmontools/run.d/10mail:
May 20 20:58:23 Latitude-5590 smartd[686]: Your system does not have /usr/bin/mail.  Install the mailx or mailutils package
May 20 20:58:23 Latitude-5590 smartd[686]: run-parts: /etc/smartmontools/run.d/10mail exited with return code 1
May 20 20:58:23 Latitude-5590 smartd[686]: Warning via /usr/share/smartmontools/smartd-runner to root: failed (32-bit/8-bit exit status: 256/1)

Further investigation with the nvme-cli tool did not help either:

manu@Latitude-5590:~$ sudo nvme error-log -e 1 /dev/nvme0
Error Log Entries for device:nvme0 entries:1
.................
 Entry[ 0]   
.................
error_count  : 75
sqid         : 0
cmdid        : 0xa012
status_field : 0x4004(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field)
parm_err_loc : 0xffff
lba          : 0
nsid         : 0
vs           : 0
cs           : 0
.................

In /var/log/boot.log.1 I found that the file system was checked during boot, but could not find the results of the check. I even don't know which program was performing the check.

manu@Latitude-5590:~$ sudo grep disk /var/log/boot.log.1
Starting File System Check…/dev/disk/by-uuid/D812-3DF4...
[  OK  ] Finished File System Check on /dev/disk/by-uuid/D812-3DF4.

Is it possible to get more details about the error? As this hard disk contains valuable data I would like to know what exactly is wrong with it. I am using Ubuntu 20.04.

2 Answers2

2

Here is an explanation for this error message:

The driver is just attempting an optional command that the device doesn't support. The driver has no way to know if the device supports it without trying, so that's what it's doing. The drive can log the error if it wants to, but this is just unnecessary for this command, IMO, but we can't do anything about that. I'd just ignore the errors.

See https://bugzilla.kernel.org/show_bug.cgi?id=217445

1

I'm having the same issue. I found some information about this here and here. From reading the first reference I believe this is a bug that manifests on some platforms that send invalid commands to the SSD, which increases the error log count on each boot of the machine. The second reference is a bug report to smartmontools related to this issue.

f1sherman
  • 111