ZFS: drive replacement performance
Freddie Cash
fjwcash at gmail.com
Tue Jul 7 20:54:02 UTC 2009
On Tue, Jul 7, 2009 at 12:56 PM, Mahlon E. Smith <mahlon at martini.nu> wrote:
> I've got a 9 sata drive raidz1 array, started at version 6, upgraded to
> version 13. I had an apparent drive failure, and then at some point, a
> kernel panic (unrelated to ZFS.) The reboot caused the device numbers
> to shuffle, so I did an 'export/import' to re-read the metadata and get
> the array back up.
>
This is why we've started using glabel(8) to label our drives, and then add
the labels to the pool:
# zpool create store raidz1 label/disk01 label/disk02 label/disk03
That way, it does matter where the kernel detects the drives or what the
physical device node is called, GEOM picks up the label, and ZFS uses the
label.
> Once I swapped drives, I issued a 'zpool replace'.
>
See comment at the end: what's the replace command that you used?
>
> That was 4 days ago now. The progress in a 'zpool status' looks like
> this, as of right now:
>
> scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
>
> ... which is a little concerning, since a) it appears to have not moved
> since I started it, and b) I'm in a DEGRADED state until it finishes...
> if it finishes.
>
There's something wrong here. It definitely should be incrementing. Even
when we did the foolish thing of creating a 24-drive raidz2 vdev and had to
replace a drive, the progress bar did change. Never got above 39% as it
kept restarting, but it did increment.
>
> So, I reach out to the list!
>
> - Is the resilver progress notification in a known weird state under
> FreeBSD?
>
> - Anything I can do to kick this in the pants? Tuning params?
>
I'd redo the replace command, and check the output of "zpool status" to make
sure it's showing the proper device node and not some random string of
numbers like it is.
> - This was my first drive failure under ZFS -- anything I should have
> done differently? Such as NOT doing the export/import? (Not sure
> what else I could have done there.)
>
If you knew which drive it was, I'd have shutdown the server and replaced
it, so that the drives came back up renumbered correctly.
This happened to us once when I was playing around with simulating dead
drives (pulling drives) and rebooting. That's when I moved over to using
glabels.
% zpool status store
> pool: store
> state: DEGRADED
> status: One or more devices is currently being resilvered. The pool will
> continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
> scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go
> config:
>
> NAME STATE READ WRITE CKSUM
> store DEGRADED 0 0 0
> raidz1 DEGRADED 0 0 0
> da0 ONLINE 0 0 0 274K
> resilvered
> da1 ONLINE 0 0 0 282K
> resilvered
> replacing DEGRADED 0 0 0
> 2025342973333799752 UNAVAIL 3 4.11K 0 was /dev/da2
> da8 ONLINE 0 0 0 418K
> resilvered
> da2 ONLINE 0 0 0 280K
> resilvered
> da3 ONLINE 0 0 0 269K
> resilvered
> da4 ONLINE 0 0 0 266K
> resilvered
> da5 ONLINE 0 0 0 270K
> resilvered
> da6 ONLINE 0 0 0 270K
> resilvered
> da7 ONLINE 0 0 0 267K
> resilvered
>
> errors: No known data errors
>
>
> -----------------------------------------------------------------------
>
>
> % zpool iostat -v
> capacity operations bandwidth
> pool used avail read write read write
> ------------------------- ----- ----- ----- ----- ----- -----
> store 1.37T 2.72T 49 106 138K 543K
> raidz1 1.37T 2.72T 49 106 138K 543K
> da0 - - 15 62 1017K 79.9K
> da1 - - 15 62 1020K 80.3K
> replacing - - 0 103 0 88.3K
> 2025342973333799752 - - 0 0 1.45K 261
> da8 - - 0 79 1.45K 98.2K
> da2 - - 14 62 948K 80.3K
> da3 - - 13 62 894K 80.0K
> da4 - - 14 63 942K 80.3K
> da5 - - 15 62 992K 80.4K
> da6 - - 15 62 1000K 80.1K
> da7 - - 15 62 1022K 80.1K
> ------------------------- ----- ----- ----- ----- ----- -----
>
That definitely doesn't look right. It should be showing the device name
there in the "replacing" section.
What's the exact "zpool replace" command that you used?
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-stable
mailing list