slight zfs problem after playing with WDIDLE3 and WDTLER

Tue Jan 26 04:13:18 UTC 2010

On Jan 25, 2010, at 8:07 PM, Tommi Lätti wrote:

> 2010/1/26 Steven Schlansker <stevenschlansker at gmail.com>:
>> 
>> On Jan 25, 2010, at 10:43 AM, Tommi Lätti wrote:
>>> After checking the logs carefully, it seems that the ada1 device
>>> permanently lost some sectors. Before twiddling with the parameters,
>>> it was 1953525168 sectors (953869MB), now it reports 1953523055
>>> (953868MB). So, would removing it and maybe export/import get me back
>>> to degraded state and then I could just replace the now
>>> suddenly-lost-some-sectors drive?
>> 
>> That will probably work.  I had a similar problem a bit
>> ago where suddenly my drives were too small, causing the UNAVAIL
>> corrupted-data problem.  I managed to fix it by using gconcat to stitch
>> an extra MB of space from the boot drive onto it.  Not a very good solution,
>> but the best I found until FreeBSD gets shrink support (which sadly seems
>> like it may be a long while)
>> 
>> Failing that, you could use OpenSolaris to import it (as it does have minimal
>> support for opening mismatched sized vdevs), copy the data off, destroy, and restore.
> 
> After thinking overnight I'm a bit curious why the whole filesystem
> failed on that single vdev causing the whole pool loss. Shouldn't the
> zfs just disregard the disk and just go to degraded state? I've had
> normal catastrophic disk failures on this setup before and normal
> replace drive+resilver has worked just fine.

I poked through the code - the problem is that ZFS identifies the drive
as valid (due to correct metadata+checksums) and then tries to assemble the
array.  At some point it checks the size, realizes that the drive is smaller,
and rejects the entire array.  It isn't smart enough (yet?) to realize that
only rejecting the one drive would allow it to be only degraded...