zfs replace disk has failed

Dan Cojocar dan.cojocar at gmail.com
Sun Feb 8 00:44:55 PST 2009


On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan <morganw at chemikals.org> wrote:
> On Tue, 3 Feb 2009, Dan Cojocar wrote:
>
>> Hello all,
>> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed,
>> after replacing the failed disk with a new one using:
>>  zpool replace tank ad1
>> I have noticed that the replace is taking too long and that the system
>> is not responding,  after restart the new disk was not recognized any
>> more in bios :(, I have tested also in another box and the disk was
>> not recognized there too.
>> I have installed a new one on the same location (ad1 I think). Then
>> the zpool status has reported something like this (this is from memory
>> because I have made many changes back then, I don't remember exactly
>> if the online disk was ad1 or ad2):
>>
>> zpool status
>>  pool: tank
>> state: DEGRADED
>> scrub: none requested
>> config:
>>
>>       NAME                        STATE     READ WRITE CKSUM
>>       tank                        DEGRADED     0     0     0
>>         mirror                    DEGRADED     0     0     0
>>           replacing               UNAVAIL      0   387     0
>> insufficient replicas
>>             10193841952954445329  REMOVED      0     0     0  was
>> /dev/ad1/old
>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>           ad2                     ONLINE       0     0     0
>> At this stage I was thinking that if I will attach the new disk (ad1)
>> to the mirror I will get sufficient replicas to detach
>> 9318348042598806923 (this one was the disk that has failed  the second
>> time), so I did an attach, after the resilvering process has completed
>> with success, I had:
>> zpool status
>>  pool: tank
>> state: DEGRADED
>> scrub: none requested
>> config:
>>
>>       NAME                        STATE     READ WRITE CKSUM
>>       tank                        DEGRADED     0     0     0
>>         mirror                    DEGRADED     0     0     0
>>           replacing               UNAVAIL      0   387     0
>> insufficient replicas
>>             10193841952954445329  REMOVED      0     0     0  was
>> /dev/ad1/old
>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>           ad2                     ONLINE       0     0     0
>>           ad1                     ONLINE       0     0     0
>> And I'm not able to detach 9318348042598806923 :(, and another bad
>> news is that if I try to access something under /tank the operation is
>> hanging, eg: if I do a ls /tank is freezing and if I do in another
>> console: zpool status which was working before ls, now it's freezing
>> too.
>> What should I do next?
>> Thanks,
>> Dan
>
> ZFS seems to fall over on itself if a disk replacement is interrupted and
> the replacement drive goes missing.
>
> By attaching the disk, you now have a 3-way mirror. The two possibilties for
> you would be to roll the array back to a previous txg, which I'm not at all
> sure would work, or to create a fake device the same size as the array
> devices and put a label on it that emulates the missing device, and you can
> then cancel the replacement. Once the replacement is cancelled, you should
> be able to remove the nonexistent device. Note, that the labels are all
> checksummed with sha256 so it's not a simple hex edit (unless you can
> calculate checksums by hand also!).
>
> If you send me the first 512k of either ad1 or ad2 (off-list of course), I
> can alter the labels to be the missing guids, and you can use md devices and
> sparse files to fool zpool.
>

Hello Wesley,
This was a production server so I had to restore the mirror from the backup.
Can you explain a bit how can  someone alter the labels of a disk in a pool?
Thanks,
Dan


More information about the freebsd-fs mailing list