zfs replace disk has failed

Sun Feb 8 09:26:10 PST 2009

On Sun, 8 Feb 2009, Dan Cojocar wrote:

> On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan <morganw at chemikals.org> wrote:
>> On Tue, 3 Feb 2009, Dan Cojocar wrote:
>>
>>> Hello all,
>>> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed,
>>> after replacing the failed disk with a new one using:
>>>  zpool replace tank ad1
>>> I have noticed that the replace is taking too long and that the system
>>> is not responding,  after restart the new disk was not recognized any
>>> more in bios :(, I have tested also in another box and the disk was
>>> not recognized there too.
>>> I have installed a new one on the same location (ad1 I think). Then
>>> the zpool status has reported something like this (this is from memory
>>> because I have made many changes back then, I don't remember exactly
>>> if the online disk was ad1 or ad2):
>>>
>>> zpool status
>>>  pool: tank
>>> state: DEGRADED
>>> scrub: none requested
>>> config:
>>>
>>>       NAME                        STATE     READ WRITE CKSUM
>>>       tank                        DEGRADED     0     0     0
>>>         mirror                    DEGRADED     0     0     0
>>>           replacing               UNAVAIL      0   387     0
>>> insufficient replicas
>>>             10193841952954445329  REMOVED      0     0     0  was
>>> /dev/ad1/old
>>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>>           ad2                     ONLINE       0     0     0
>>> At this stage I was thinking that if I will attach the new disk (ad1)
>>> to the mirror I will get sufficient replicas to detach
>>> 9318348042598806923 (this one was the disk that has failed  the second
>>> time), so I did an attach, after the resilvering process has completed
>>> with success, I had:
>>> zpool status
>>>  pool: tank
>>> state: DEGRADED
>>> scrub: none requested
>>> config:
>>>
>>>       NAME                        STATE     READ WRITE CKSUM
>>>       tank                        DEGRADED     0     0     0
>>>         mirror                    DEGRADED     0     0     0
>>>           replacing               UNAVAIL      0   387     0
>>> insufficient replicas
>>>             10193841952954445329  REMOVED      0     0     0  was
>>> /dev/ad1/old
>>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>>           ad2                     ONLINE       0     0     0
>>>           ad1                     ONLINE       0     0     0
>>> And I'm not able to detach 9318348042598806923 :(, and another bad
>>> news is that if I try to access something under /tank the operation is
>>> hanging, eg: if I do a ls /tank is freezing and if I do in another
>>> console: zpool status which was working before ls, now it's freezing
>>> too.
>>> What should I do next?
>>> Thanks,
>>> Dan
>>
>> ZFS seems to fall over on itself if a disk replacement is interrupted and
>> the replacement drive goes missing.
>>
>> By attaching the disk, you now have a 3-way mirror. The two possibilties for
>> you would be to roll the array back to a previous txg, which I'm not at all
>> sure would work, or to create a fake device the same size as the array
>> devices and put a label on it that emulates the missing device, and you can
>> then cancel the replacement. Once the replacement is cancelled, you should
>> be able to remove the nonexistent device. Note, that the labels are all
>> checksummed with sha256 so it's not a simple hex edit (unless you can
>> calculate checksums by hand also!).
>>
>> If you send me the first 512k of either ad1 or ad2 (off-list of course), I
>> can alter the labels to be the missing guids, and you can use md devices and
>> sparse files to fool zpool.
>>
>
> Hello Wesley,
> This was a production server so I had to restore the mirror from the backup.
> Can you explain a bit how can  someone alter the labels of a disk in a pool?
> Thanks,
> Dan
>

As far as I know there is no tool available to interactively edit a label, 
although since the source code that defines the labels and the data within 
is available it should be possible to write. For devices in the same pool, 
they should all have nearly identical labels, differing only in the actual 
guid for the device itself. In my situation, I simply altered the guid 
with a hex editor and borrowed the zfs sha256 code to write the correct 
checksum to the label and using gvirstor (md probably would have worked as 
well) was able to cancel the failed replacement.