ZFS - Unable to offline drive in raidz1 based pool

Mon Sep 28 17:29:20 UTC 2009

I've run into a similar experience again with my zfs raidz1 array
reporting itself as healthy when it's not.  This, again, was after
some drive spin_retry_count errors (and a power cycle when unable to
shutdown -h).  The pattern goes as follows:

1) A hard drive in the zfs array (for whatever reason) repeatedly
times out.. in this case, generating spin_retry_count errors in the
smart status.
2) The box is semi-frozen because it cannot deal with activity on the
zfs array, so it won't gracefully shutdown -h now.
3) The box is power cycled.
4) Everything spins up fine on the box, the array is now accessible.
5) zpool status - shows the array as online with no degraded status
6) zpool scrub - shows the drives to be desynced and resilvers a couple of them
7) presumably, everything is fine

monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            ad12    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
        spares
          ad22      AVAIL

errors: No known data errors
monolith# zpool scrub storage
monolith# zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Mon Sep 28 11:17:05 2009
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad14    ONLINE       0     0     0  1.17M resilvered
            ad6     ONLINE       0     0     0  1.50K resilvered
            ad12    ONLINE       0     0     0  2K resilvered
            ad4     ONLINE       0     0     0  2K resilvered
        spares
          ad22      AVAIL

errors: No known data errors

So, my question still stands.. how does zfs upon scrubbing, instantly
know that the drives need to be resilvered (it completes in a few
seconds), but previous declares the array to be fine with no known
date errors?

Cheers,
-kurt