Overview
There will come a point when you will need to replace one of your disks within your ZFS pool. Perhaps a disk has become unavailable, or you are getting read/write errors. The method will be the same whether you have a mirrored or RAIDZ virtual device configuration.
This article will show you how to use the command line to replace a faulty disk and resilver your ZFS pool.
The following assumptions are made regarding your setup:-
- Your computer or server has ZFS on Linux installed.
- Your ZFS pool has a Mirrored or RAIDZ virtual device configuration.
Backup your data!!
Please make sure you have a full backup of your data before replacing any disks.
This article is a useful guide, but we cannot take any responsibility if anything goes wrong.
Investigate disk error
Identify faulty disk
We need to find out which disk is faulty. Let's check the status of our pool.
sudo zpool status
pool: DUMPSTER
state: ONLINE
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
DUMPSTER DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde UNAVAIL 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
errors: No known data errors
This tells us that my pool named DUMPSTER is degraded due to the disk named sde being unavailable.
Obtain disk serial number
In a moment, we will need to physically remove the faulty disk drive and replace it. But which disk do you remove? We need some more information about sde.
lsblk -I 8 -d -o NAME,SIZE,SERIAL
NAME SIZE SERIAL
sda 14.9G 16GB40021676
sdb 1.8T S2H7J9GB954718
sdc 1.8T S2HGJ9KB709895
sdd 1.8T S2H7J9GB902334
sde 1.8T S2HGJ9KB808656
sdf 1.8T S2HGJ9KB654653
sdg 1.8T S2HGJ9KB727154
This tells me that disk sde has the serial number S2HGJ9KB808656. The serial number will be printed on the physical disk label.
Swap out the disk
This method assumes you are going to use the same physical slot for the new disk that the faulty one was connected to.
Do the following for starters:-
- Shutdown the server.
- Inspect each disk drive until you locate the faulty one by its serial number.
- Unplug and remove the faulty disk.
- Put the new disk in its place and connect it.
- Turn the server back on.
Now check that status of the pool.
sudo zpool status
pool: DUMPSTER
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
DUMPSTER DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
7113731234554292699 UNAVAIL 0 0 0 was /dev/sde1
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
errors: No known data errors
As you can see, the pool is still in a degraded state. Importantly, it tells us the location of the removed disk, being /dev/sde1. We will need this information shortly.
Replace the disk
We have physically swapped our disk, but we need to tell our ZFS pool that we have replaced the faulty disk with a new one. Make sure you replace DUMPSTER and sde with the name of your pool and disk.
sudo zpool replace DUMPSTER sde
ZFS will begin migrating data to the new disk as soon as the replace is issued. Once the resilvering completes, the faulty disk will be removed, and the pool will be restored to the ONLINE state. You can see the progress of the resilvering by checking the status of the pool.
sudo zpool status DUMPSTER
pool: DUMPSTER
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Apr 5 13:16:12 2020
12.8G scanned out of 1.45T at 131M/s, 3h11m to go
2.02G resilvered, 0.86% done
config:
NAME STATE READ WRITE CKSUM
DUMPSTER DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
replacing-3 DEGRADED 0 0 0
7113731234554292699 UNAVAIL 0 0 0 was /dev/sde1/old
sde ONLINE 0 0 0 (resilvering)
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
errors: No known data errors
Check that it has worked
Once complete, the pool should return to an online state.
sudo zpool status
pool: DUMPSTER
state: ONLINE
scan: resilvered 235G in 2h10m with 0 errors on Sun Apr 5 15:26:31 2020
config:
NAME STATE READ WRITE CKSUM
DUMPSTER ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
errors: No known data errors
Your ZFS pool should now be back up and running.