ZFS: Can't repair raidz2 (Cannot replace a replacing device)

Steven Schlansker stevenschlansker at gmail.com
Sat Jan 9 23:35:33 UTC 2010


On Dec 27, 2009, at 8:59 PM, Wes Morgan wrote:

> On Sun, 27 Dec 2009, Steven Schlansker wrote:
> 
>> 
>> On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote:
>> 
>>> On Wed, 23 Dec 2009, Steven Schlansker wrote:
>>>> 
>>>> Why has the replacing vdev not gone away?  I still can't detach -
>>>> [steven at universe:~]% sudo zpool detach universe 6170688083648327969
>>>> cannot detach 6170688083648327969: no valid replicas
>>>> even though now there actually is a valid replica (ad26)
>>> 
>>> Try detaching ad26. If it lets you do that it will abort the replacement and then you just do another replacement with the real device. If it won't let you do that, you may be stuck having to do some metadata tricks.
>>> 
>> 
>> errors: No known data errors
>> [steven at universe:~]% sudo zpool detach universe ad26
>> cannot detach ad26: no valid replicas
>> [steven at universe:~]% sudo zpool offline -t universe ad26
>> cannot offline ad26: no valid replicas
>> 
> 
> I just tried to re-create this scenario with some sparse files and I was able to detach it completely (below). There is one difference, however. Your array is returning checksum errors for the ad26 device. Perhaps this is making the system think that there is no sibling device in the replacement node that has all the data, so it denies the detach. Even though logically the data will be recovered by a scrub later.. Interesting. If you can determine where the detach is failing, that will help paint the complete picture.
> 

Interestingly enough, I found a solution!  Somewhat roundabout, but what I did was replace a different device and let it resilver completely.  Then the array looked like this:

        NAME                       STATE     READ WRITE CKSUM
        universe                   DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            ad16                   ONLINE       0     0     0
            replacing              DEGRADED     0     0     0
              ad26                 ONLINE       0     0     0
              6170688083648327969  UNAVAIL      0 1.13M     0  was /dev/ad12
            ad8                    ONLINE       0     0     0
            da0                    ONLINE       0     0     0
            ad10                   ONLINE       0     0     0
            concat/ad4ex           ONLINE       0     0     0
            ad24                   ONLINE       0     0     0
            concat/ad6ex           ONLINE       0     0     0

Just for kicks, I then tried to detach -

[steven at universe:~]% sudo zpool detach universe 6170688083648327969
[steven at universe:~]% sudo zpool status
  pool: universe
 state: ONLINE
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        universe          ONLINE       0     0     0
          raidz2          ONLINE       0     0     0
            ad16          ONLINE       0     0     0
            ad26          ONLINE       0     0     0
            ad8           ONLINE       0     0     0
            da0           ONLINE       0     0     0
            ad10          ONLINE       0     0     0
            concat/ad4ex  ONLINE       0     0     0
            ad24          ONLINE       0     0     0
            concat/ad6ex  ONLINE       0     0     0

Ta-da!  I have no idea why this helped, or how it fixed it, but if anyone has this problem
in the future try replacing a different device, letting it resilver, and then detach the original problematic devices.



More information about the freebsd-fs mailing list