raidz: error during resilver: what next?

Fri Sep 10 20:35:18 UTC 2010

Hi all,

I'm running 8.1-RELEASE on amd64. I'm upgrading a 4-disk raidz to faster drives. 
While resilvering drive #3, I hit errors on drive #4.

The old drives are ada{1,2,3,4}. New drives are label/hitachi{1,2,3,4}.

While resilvering drive 3, I got timeouts on ada4 that caused `zpool status` to 
hang forever. Even `shutdown -r` did not reboot. Hard reset was required. dmesg 
was full of ada4 timeouts, the most recent of which was several hours old.

After power cycle, ada4 appeared fine, reporting no recent smart errors.

My current zpool status:

  pool: tank
state: DEGRADED
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:

    NAME                  STATE     READ WRITE CKSUM
    tank                  DEGRADED     0     0     0
      raidz1              DEGRADED     0     0     0
        label/hitachi1    ONLINE       0     0     0
        label/hitachi2    ONLINE       0     0     0
        replacing         DEGRADED     0     0     1
          ada3            OFFLINE      0   283     0
          label/hitachi3  ONLINE       0     0     0
        ada4              ONLINE       0     0    10

The resilvering did not complete, and did not automatically continue after the 
power cycle.

Where do I go from here? I see two choices:

1. Continue the resilver and see what happens. (How do I restart the resilver?)

2. Cancel the replacement. Pull hitachi3 and put back in the (still good) ada3. 
What commands do I use? My guess:
  zpool offline tank label/hitachi3
  <replace disk>
  zpool online tank ada3

Thanks in advance for your help.

[In other news, while ada4 was timed out, booting from my other zfs pool zroot 
failed with all kinds of bizarre missing file errors. It seems that one ZFS pool 
being degraded/corrupted has affected the other ZFS pool too. I now regret going 
to the trouble of zfs boot.]