ZFS: Can't repair raidz2 (Cannot replace a replacing device)
Steven Schlansker
stevenschlansker at gmail.com
Sat Jan 9 23:35:33 UTC 2010
On Dec 27, 2009, at 8:59 PM, Wes Morgan wrote:
> On Sun, 27 Dec 2009, Steven Schlansker wrote:
>
>>
>> On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote:
>>
>>> On Wed, 23 Dec 2009, Steven Schlansker wrote:
>>>>
>>>> Why has the replacing vdev not gone away? I still can't detach -
>>>> [steven at universe:~]% sudo zpool detach universe 6170688083648327969
>>>> cannot detach 6170688083648327969: no valid replicas
>>>> even though now there actually is a valid replica (ad26)
>>>
>>> Try detaching ad26. If it lets you do that it will abort the replacement and then you just do another replacement with the real device. If it won't let you do that, you may be stuck having to do some metadata tricks.
>>>
>>
>> errors: No known data errors
>> [steven at universe:~]% sudo zpool detach universe ad26
>> cannot detach ad26: no valid replicas
>> [steven at universe:~]% sudo zpool offline -t universe ad26
>> cannot offline ad26: no valid replicas
>>
>
> I just tried to re-create this scenario with some sparse files and I was able to detach it completely (below). There is one difference, however. Your array is returning checksum errors for the ad26 device. Perhaps this is making the system think that there is no sibling device in the replacement node that has all the data, so it denies the detach. Even though logically the data will be recovered by a scrub later.. Interesting. If you can determine where the detach is failing, that will help paint the complete picture.
>
Interestingly enough, I found a solution! Somewhat roundabout, but what I did was replace a different device and let it resilver completely. Then the array looked like this:
NAME STATE READ WRITE CKSUM
universe DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
ad16 ONLINE 0 0 0
replacing DEGRADED 0 0 0
ad26 ONLINE 0 0 0
6170688083648327969 UNAVAIL 0 1.13M 0 was /dev/ad12
ad8 ONLINE 0 0 0
da0 ONLINE 0 0 0
ad10 ONLINE 0 0 0
concat/ad4ex ONLINE 0 0 0
ad24 ONLINE 0 0 0
concat/ad6ex ONLINE 0 0 0
Just for kicks, I then tried to detach -
[steven at universe:~]% sudo zpool detach universe 6170688083648327969
[steven at universe:~]% sudo zpool status
pool: universe
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
universe ONLINE 0 0 0
raidz2 ONLINE 0 0 0
ad16 ONLINE 0 0 0
ad26 ONLINE 0 0 0
ad8 ONLINE 0 0 0
da0 ONLINE 0 0 0
ad10 ONLINE 0 0 0
concat/ad4ex ONLINE 0 0 0
ad24 ONLINE 0 0 0
concat/ad6ex ONLINE 0 0 0
Ta-da! I have no idea why this helped, or how it fixed it, but if anyone has this problem
in the future try replacing a different device, letting it resilver, and then detach the original problematic devices.
More information about the freebsd-fs
mailing list