Replace a disk in a ZFS pool

Replace a disk in a ZFS pool

Replace a faulty disk on a Mirrored or RAIDZ virtual device

Last updated:

linux nas ubuntu zfs

Overview

There will come a point when you will need to replace one of your disks within your ZFS pool. Perhaps a disk has become unavailable, or you are getting read/write errors. The method will be the same whether you have a mirrored or RAIDZ virtual device configuration.

This article will show you how to use the command line to replace a faulty disk and resilver your ZFS pool.

The following assumptions are made regarding your setup:-

  • Your computer or server has ZFS on Linux installed.
  • Your ZFS pool has a Mirrored or RAIDZ virtual device configuration.

Backup your data!!

Please make sure you have a full backup of your data before replacing any disks.

This article is a useful guide, but we cannot take any responsibility if anything goes wrong.

Investigate disk error

Identify faulty disk

We need to find out which disk is faulty. Let's check the status of our pool.

sudo zpool status
  pool: DUMPSTER
 state: ONLINE
status: One or more devices could not be used because the label is missing or
        invalid. Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        DUMPSTER    DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     UNAVAIL      0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0

errors: No known data errors

This tells us that my pool named DUMPSTER is degraded due to the disk named sde being unavailable.

Obtain disk serial number

In a moment, we will need to physically remove the faulty disk drive and replace it. But which disk do you remove? We need some more information about sde.

lsblk -I 8 -d -o NAME,SIZE,SERIAL
NAME  SIZE SERIAL
sda  14.9G 16GB40021676
sdb   1.8T S2H7J9GB954718
sdc   1.8T S2HGJ9KB709895
sdd   1.8T S2H7J9GB902334
sde   1.8T S2HGJ9KB808656
sdf   1.8T S2HGJ9KB654653
sdg   1.8T S2HGJ9KB727154

This tells me that disk sde has the serial number S2HGJ9KB808656. The serial number will be printed on the physical disk label.

Swap out the disk

This method assumes you are going to use the same physical slot for the new disk that the faulty one was connected to.

Do the following for starters:-

  • Shutdown the server.
  • Inspect each disk drive until you locate the faulty one by its serial number.
  • Unplug and remove the faulty disk.
  • Put the new disk in its place and connect it.
  • Turn the server back on.

Now check that status of the pool.

sudo zpool status
  pool: DUMPSTER
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        DUMPSTER                 DEGRADED     0     0     0
          raidz1-0               DEGRADED     0     0     0
            sdb                  ONLINE       0     0     0
            sdc                  ONLINE       0     0     0
            sdd                  ONLINE       0     0     0
            7113731234554292699  UNAVAIL      0     0     0  was /dev/sde1
            sdf                  ONLINE       0     0     0
            sdg                  ONLINE       0     0     0

errors: No known data errors

As you can see, the pool is still in a degraded state. Importantly, it tells us the location of the removed disk, being /dev/sde1. We will need this information shortly.

Replace the disk

We have physically swapped our disk, but we need to tell our ZFS pool that we have replaced the faulty disk with a new one. Make sure you replace DUMPSTER and sde with the name of your pool and disk.

sudo zpool replace DUMPSTER sde

ZFS will begin migrating data to the new disk as soon as the replace is issued. Once the resilvering completes, the faulty disk will be removed, and the pool will be restored to the ONLINE state. You can see the progress of the resilvering by checking the status of the pool.

sudo zpool status DUMPSTER
  pool: DUMPSTER
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Apr  5 13:16:12 2020
        12.8G scanned out of 1.45T at 131M/s, 3h11m to go
        2.02G resilvered, 0.86% done
config:

        NAME                       STATE     READ WRITE CKSUM
        DUMPSTER                   DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            sdb                    ONLINE       0     0     0
            sdc                    ONLINE       0     0     0
            sdd                    ONLINE       0     0     0
            replacing-3            DEGRADED     0     0     0
              7113731234554292699  UNAVAIL      0     0     0  was /dev/sde1/old
              sde                  ONLINE       0     0     0  (resilvering)
            sdf                    ONLINE       0     0     0
            sdg                    ONLINE       0     0     0

errors: No known data errors

Check that it has worked

Once complete, the pool should return to an online state.

sudo zpool status
  pool: DUMPSTER
 state: ONLINE
  scan: resilvered 235G in 2h10m with 0 errors on Sun Apr  5 15:26:31 2020
config:

        NAME        STATE     READ WRITE CKSUM
        DUMPSTER    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0
            sde     ONLINE       0     0     0
            sdf     ONLINE       0     0     0
            sdg     ONLINE       0     0     0

errors: No known data errors

Your ZFS pool should now be back up and running.