2

I have a pool which contains a disc with only a few errors. I wanted to clear those errors out and see if they came back before I purchased a new disc:

me@server:/$ sudo zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0 in 7h1m with 0 errors on Sat May 24 20:44:13 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            ata-ST3000DM001-1CH166_Z1F1HTEW             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F1HDXJ             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F1J33Y             FAULTED      9     1     0  too many errors
            ata-ST3000DM001-1CH166_Z1F1HM7F             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F1HE23             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F175HQ             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F1K3TV             ONLINE       0     0     0
            ata-ST3000DM001-1CH166_Z1F1D1XR             ONLINE       0     0     0
        logs
          ata-SSD2SC120GS4DH08B-T_PNY10130000139160634  ONLINE       0     0     0
        cache
          ata-SSD2SC120GS4DH08B-T_PNY10130000139160672  ONLINE       0     0     0

errors: No known data errors

I offlined the disc in question using its guid obtained from the zdb command:

zpool offline tank 12956315685006632708

I cleared the errors:

zpool clear tank 12956315685006632708

Then I tried to online the disc, but was greeted with the following:

zpool online tank 12956315685006632708
cannot online 12956315685006632708: no such device in pool

I have tried substituting the GUID with ata-ST3000DM001-1CH166_Z1F1J33Y and with /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F1J33Y but I get the same results.

I also tried using the replace command with all the above identifiers with no luck.

I have read about a cfgadm command that I may need to run to "unconfigure/reconfigure" the disc, but it does not appear to be available to Ubuntu.

How do I get this disc back online?

1 Answers1

2

With help from this question and its answers I was able to get this working (currently resilvering). To any ZFS developers reading this, the operation of replacing a disc should be made much easier/full (and fool) proof. After the initial setup and on a long enough time line the only operation every ZFS user will eventually have to do is replace a disc. But I digress.

In my case I didn't want to replace my drive, I only wanted to clear its faults and get it back online fault free so I could determine if the disc actually needed to be replaced. My reasoning for wanting to do this was 1.) the system had been on for nearly a year without a reboot so bit rot is a possibility 2.) the number of errors was fairly small.

The trick it seemed for me was to delete the partitions from the disc in question. First I needed to determine which device to operate on. You could determine this via process of elimination using zdb, but I used lshw to get all the info I needed at once. Basically I wanted to correlate the the device zpool status was telling me had failed to a /dev/sdX device (if you are not familiar with less just use the up/down keys to show more/less and the q key to quit).

root@server:/# lshw|less
...
              *-disk:2
                   description: ATA Disk
                   product: ST3000DM001-1CH1
                   vendor: Seagate
                   physical id: 0.2.0
                   bus info: scsi@0:0.2.0
                   logical name: /dev/sdd
                   version: CC24
                   serial: Z1F1J33Y
                   size: 2794GiB (3TB)
                   capacity: 2794GiB (3TB)
                   capabilities: 15000rpm gpt-1.00 partitioned partitioned:gpt
                   configuration: ansiversion=6 guid=52d25a12-120a-1c40-92a1-0be436c2d642 sectorsize=4096
                 *-volume:0
                      description: OS X ZFS partition or Solaris /usr partition
                      vendor: Solaris
                      physical id: 1
                      bus info: scsi@0:0.2.0,1
                      logical name: /dev/sdd1
                      serial: f25724a4-dd55-764c-af34-9479521854b9
                      capacity: 2794GiB
                      configuration: name=zfs
                 *-volume:1
                      description: reserved partition
                      vendor: Solaris
                      physical id: 9
                      bus info: scsi@0:0.2.0,9
                      logical name: /dev/sdd9
                      serial: 89eeeedb-e3a0-4940-8a50-d3d7506ad603
                      capacity: 8191KiB
...

Here I can see that /dev/sdd is the disc I wanted. I then deleted the partitions from that disc:

root@server:/# fdisk -l /dev/sdd

Then just follow the onscreen prompts for deleting the partition. Here is a good guide. After that the following command finally worked:

root@server:/# zpool replace tank 12956315685006632708 /dev/disk/by-id/ata-ST3000DM001-1CH166_Z1F1J33Y

The guid (that long number) was obtained using the zdb command. I think in the end I might have been able to use the online command instead of replace, but I didn't try that.