ZFS panic under extreme circumstances (2/3 disks corrupted)

Mon May 25 00:50:37 UTC 2009

Ivan Voras wrote:
> Thomas Backman wrote:
>   
>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote:
>>
>>     
>>> 5) Check if the md5 of file: everything OK, zpool status shows a
>>> degraded pool.
>>> 6) Repeat step #4, but with disk 3.
>>> 7) zpool scrub test
>>> 8) Panic!
>>>
>>>       
> Did you account for the time factor? Between your steps 5 and 6,
> wouldn't ZFS automatically begin data repair?
>   

ZFS probably only repairs errors it sees in step 5, i.e. if he reads a
corrupted sector that sector might be fixed, but ZFS does not start a
scrub looking for other corruption.

His test probably clobbered metadata for the pool or such: something not
touched by the md5(1) in step 5.  That error might not have been seen
until step 7 by which point step 6 has rendered the pool unrepairable.

The original test might need to actually read the disk blocks before
overwrite to make sure it's file data and not something else otherwise
the test probably isn't going to be a valid test of automatic self-repair.