Problems replacing failing drive in ZFS pool

Thu Jul 22 00:02:24 UTC 2010

On 7/19/2010 10:50 PM, Adam Vande More wrote:
> On Mon, Jul 19, 2010 at 9:07 PM, Dan Langille<dan at langille.org>  wrote:
>
>> I think it's because you pull the old drive, boot with the new drive,
>>> the controller re-numbers all the devices (ie da3 is now da2, da2 is
>>> now da1, da1 is now da0, da0 is now da6, etc), and ZFS thinks that all
>>> the drives have changed, thus corrupting the pool.  I've had this
>>> happen on our storage servers a couple of times before I started using
>>> glabel(8) on all our drives (dead drive on RAID controller, remove
>>> drive, reboot for whatever reason, all device nodes are renumbered,
>>> everything goes kablooey).
>>>
>>
>>
>> Can you explain a bit about how you use glabel(8) in conjunction with ZFS?
>>   If I can retrofit this into an exist ZFS array to make things easier in the
>> future...
>>
>
> If you've used whole disks in ZFS, you can't retrofit it if by retrofit you
> mean an almost painless method of resolving this.  GEOM setup stuff
> generally should happen BEFORE the file system is on it.
>
> You would create your partition(s) slightly smaller than the disk, label it,
> then use the resulting device as your zfs device when creating the pool.  If
> you have an existing full disk install, that means restoring the data after
> you've done those steps.  It works just as well with MBR style partitioning,
> there's nothing saying you have to use GPT.  GPT is just better though in
> terms of ease of use IMO among other things.

FYI, this is exactly what I'm doing to do.  I have obtained addition HDD 
to serve as temporary storage.  I will also use them for practicing the 
commands before destroying the original array.  I'll post my plan to the 
list for review.

-- 
Dan Langille - http://langille.org/