zpool scrub stops making progress after a period of time?

Thu Jul 15 13:20:42 UTC 2010

Hey world,
I've got an AMD64 system on SVN r209893 of 8.1, with a standard
GENERIC config [except that DDB and DTRACE are turned on].

I've got a 10-disk RAID-Z2, made on 8.0-RELEASE, that had two disks in
it fault, one after another. I did a zpool replace on each of them,
and it was happily resilvering for about 12 hours, and got to 25% done
at 12 hours.

I'll let the following output of zpool status speak for itself:

# zpool status -v
  pool: bukkit
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver in progress for 23h15m, 25.16% done, 69h11m to go
config:

	NAME                        STATE     READ WRITE CKSUM
	bukkit                      DEGRADED     0     0    28
	  raidz2                    DEGRADED     0     0    56
	    replacing               DEGRADED     0     0     0
	      da1                   FAULTED      0  244K     0  corrupted data
	      da11                  ONLINE       0     0     0  274G resilvered
	    da9                     ONLINE       0     0     0  333M resilvered
	    da1                     ONLINE       0     0     0  348M resilvered
	    da8                     ONLINE       0     0     0  333M resilvered
	    da0                     ONLINE       0     0     0  348M resilvered
	    replacing               DEGRADED     0     0     0
	      12471449581279369829  FAULTED      0  234K     0  was /dev/da7
	      da2                   ONLINE       0     0     0  274G resilvered
	    da6                     ONLINE       0     0     0  348M resilvered
	    da10                    ONLINE       0     0     0  333M resilvered
	    da5                     ONLINE       0     0     0  349M resilvered
	    da7                     ONLINE       0     0     0  333M resilvered

So, 12 hours have progressed, and the resilver hasn't. zpool iostat 1
reports activity whenever I do any IO on any of the filesystems
contained in the pool, but otherwise is just a straight line of 0
activity.

dmesg has nothing interesting - the last messages in it are from when
I inserted the replacement disks and it noted the secondary GPT tables
were wrong.

I could restart it, but it's not clear to me that this would help anything.

Thanks,
- Rich