ZFS: Can't repair raidz2 (Cannot replace a replacing device)
stevenschlansker at gmail.com
Thu Dec 24 00:36:07 UTC 2009
On Dec 22, 2009, at 5:41 PM, Rich wrote:
> http://kerneltrap.org/mailarchive/freebsd-fs/2009/9/30/6457763 may be
> useful to you - it's what we did when we got stuck in a resilver loop.
> I recall being in the same state you're in right now at one point, and
> getting out of it from there.
> I think if you apply that patch, you'll be able to cancel the
> resilver, and then resilver again with the device you'd like to
> resilver with.
Thanks for the suggestion, but the problem isn't that it's stuck
in a resilver loop (which is what the patch seems to try to avoid)
but that I can't detach a drive.
Now I got clever and fudged a label onto the new drive (copied the first
50MB of one of the dying drives), ran a scrub, and have this layout -
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
scrub: scrub completed after 20h58m with 0 errors on Wed Dec 23 11:36:43 2009
NAME STATE READ WRITE CKSUM
universe DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
ad16 ONLINE 0 0 0
replacing DEGRADED 0 0 40.7M
ad26 ONLINE 0 0 0 506G repaired
6170688083648327969 UNAVAIL 0 88.7M 0 was /dev/ad12
ad8 ONLINE 0 0 0
concat/back2 ONLINE 0 0 0
ad10 ONLINE 0 0 0
concat/ad4ex ONLINE 0 0 0
ad24 ONLINE 0 0 0
concat/ad6ex ONLINE 48 0 0 28.5K repaired
Why has the replacing vdev not gone away? I still can't detach -
[steven at universe:~]% sudo zpool detach universe 6170688083648327969
cannot detach 6170688083648327969: no valid replicas
even though now there actually is a valid replica (ad26)
Additionally, running zpool clear hangs permanently and in fact freezes all IO
to the pool. Since I've mounted /usr from the pool, this is effectively
death to the system. Any other zfs commands seem to work okay
(zpool scrub, zfs mount, etc.). Just clear is insta-death. I can't
help but suspect that this is caused by the now non-sensical vdev configuration
(replacing with one good drive and one nonexistent one)...
Any further thoughts? Thanks,
> - Rich
> On Tue, Dec 22, 2009 at 6:15 PM, Miroslav Lachman <000.fbsd at quip.cz> wrote:
>> Steven Schlansker wrote:
>>> As a corollary, you may notice some funky concat business going on.
>>> This is because I have drives which are very slightly different in size (<
>>> and whenever one of them goes down and I bring the pool up, it helpfully
>>> expands the pool by a whole megabyte then won't let the drive back in.
>>> This is extremely frustrating... is there any way to fix that? I'm
>>> eventually going to keep expanding each of my drives one megabyte at a
>>> using gconcat and space on another drive! Very frustrating...
>> You can avoid it by partitioning the drives to the well known 'minimal' size
>> (size of smallest disk) and use the partition instead of raw disk.
>> For example ad12s1 instead of ad12 (if you creat slices by fdisk)
>> of ad12p1 (if you creat partitions by gpart)
>> You can also use labels instead of device name.
>> Miroslav Lachman
>> freebsd-fs at freebsd.org mailing list
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> If you are over 80 years old and accompanied by your parents, we will
> cash your check.
More information about the freebsd-fs