RFC: Suggesting ZFS "best practices" in FreeBSD

Wed Jan 23 03:51:25 UTC 2013

Inline below...
On Jan 22, 2013, at 6:40 PM, Jason Keltz <jas at cse.yorku.ca> wrote:
<SNIP>
>>> #1.  Map the physical drive slots to how they show up in FBSD so if a disk is removed and the machine is rebooted all the disks after that removed one do not have an 'off by one error'.  i.e. if you have ada0-ada14 and remove ada8 then reboot - normally FBSD skips that missing ada8 drive and the next drive (that used to be ada9) is now called ada8 and so on...
>> 
>> How do you do that?  If I'm in that situation, I think I could find the bad drive, or at least the good ones, with diskinfo and the drive serial number.  One suggestion I saw somewhere was to use disk serial numbers for label values.
> I think that was using /boot/device.hints.  Unfortunately it only works for some systems, and not for all..  and someone shared an experience with me where a kernel update caused the card probe order to change, the devices to change, and then it all broke...  It worked for one card, not for the other...  I gave up because I wanted consistency across different systems..

I am not sure, but possibly I hit that same issue about pci-probing with our ZFS test machine - basically I vaguely recall asking to have the SATA controllers have their slots swapped without completely knowing why it needed to be done other than it did need to be done.  It could have been from an upgrade from FBSD 7.x -> 8.x -> 9.x, or could have just because its a test box and there were other things going on with for a while and the cards had got put back in out of order after doing some other stuff.

This is actually kind of an interesting problem overall - logical vs. physical and how to keep things mapped in a way that makes sense.  The linux community has run into this and substantially (from a basic end user perspective) changed the way they deal with hardware MAC addresses and ethernet cards between RHEL5 and RHEL6.  Ultimately neither of their techniques works very well.  For the FreeBSD community we should probably pick one or another strategy and just standardize on it with its warts and all and have it documented?

> 
> In my own opinion, the whole process of partitioning drives, labelling them, all kinds of tricks for dealing with 4k drives, manually configuring /boot/device.hints, etc. is something that we have to do, but honestly, I really believe there *has* to be a better way....  

I agree.  At this point the only solution I can think of to be able to use ZFS on FreeBSD for production systems is to write scripts that do all of this - all the goofy gpart + gnop + everything else.  How is anybody supposed to replace a disk in a system in an emergency situation by having to run a bunch of cryptic command line stuff on a disk before they can even confidently put it in as a replacement for the original?  And by definition of having to do a bunch of manual command line stuff you can not be reliably confident?

> Years back when I was using a 3ware/AMCC RAID card (actually, I AM still using a few), none of this was an issue... every disk just appeared in order.. I didn't have to configure anything specially ..  ordering never changed when I removed a drive, I didn't need to partition or do anything with the disks - just give it the raw disks, and it knew what to do...  If anything, I took my labeller and labelled the disk bays with a numeric label so when I got an error, I knew which disk to pull, but order never changed, and I always pulled the right drive... Now, I look at my pricey "new" system, see disks ordered by default in what seems like an almost "random" order... I dded each drive to figure out the exact ordering, and labelled the disks, but it just gets really annoying....

A lot of these things - about making sure that a little extra space is spared on the drive when an array is first built so that when a new drive with slightly smaller capacity is the replacement - the RAID vendors have hidden that away from the end user.  In many cases they have only done that in the last 10 years or so?  And I stumbled a few weeks ago about a Sun ZFS user that had received Sun certified disks that had the same issue - a few sectors too small...

Overall you are describing exactly the kind of behavior I want, and I think everybody needs from a FreeBSD+ZFS system.

- Alarm sent out - drive #52 failed- wake up and deal with it.
- Go to server (or call data center) - groggily look at labels on front of disk caddies - physically pull drive #52
- insert new similarly sized drive from inventory as new #52.  
- Verify resilver is in progress
- Confidently go back to bed knowing all is okay

The above scenario is just unworkable right now for most people (even tech-savvy people) because of the lack of documentation - hence I am glad to see some kind of 'best practices' document put together.
<SNIP>

- Mike