ZFS "zpool replace" problems

Tue Jan 26 17:10:15 UTC 2010

On Tue, 26 Jan 2010 08:59:27 -0800 Chuck Swiger <cswiger at mac.com> wrote
about Re: ZFS "zpool replace" problems:

CS> As a general matter of maintaining RAID systems, however, the approach
CS> to upgrading drive firmware on members of a RAID array should be to
CS> take down the entire container and offline the drives, update one
CS> drive, test it (via SMART self-test and read-only checksum comparison
CS> or similar), and then proceed to update all of the drives (preferably
CS> doing the SMART self-test for each, if time allows) before returning
CS> them to the RAID container and onlining them.

Well, I had several spare drives sitting on the shelf. So I updated the
firmware of these spare drives and now want to replace the drives with the
old firmware by new new ones one-by-one. Taking the system offline for
longer than a few minutes is not really an option. I'd rather roll in a
new machine to take over the job in that case.

CS> Pulling individual drives from a RAID set while live and updating the
CS> firmware one at a time is not an approach I would take-- running with
CS> mixed firmware versions doesn't thrill me, and I know of multiple
CS> cases where someone made a mistake reconnecting a drive with the wrong
CS> SCSI id or something like that, taking out a second drive while the
CS> RAID was not redundant, resulting in massive data corruption or even
CS> total loss of the RAID contents.

This scenario was exactly the reason why I plugged in the new drive to an
extra slot and asked zfs to replace it with an old one. Well, I did not
know what kind of fiasco the controller for this extra slot would turn out
to be - otherwise I would have used the hot-spare slot for this in the
first place.

cu
  Gerrit