ZFS w/failing drives - any equivalent of Solaris FMA?

Jeremy Chadwick koitsu at FreeBSD.org
Fri Sep 12 18:22:37 UTC 2008


On Fri, Sep 12, 2008 at 10:12:09AM -0700, Freddie Cash wrote:
> On September 12, 2008 09:32 am Jeremy Chadwick wrote:
> > For home use, sure.  Since most home/consumer systems do not include
> > hot-swappable drive bays, rebooting is required.  Although more and
> > more consumer motherboards are offering AHCI -- which is the only
> > reliable way you'll get that capability with SATA.
> >
> > In my case with servers in a co-lo, it's not acceptable.  Our systems
> > contain SATA backplanes that support hot-swapping, and it works how it
> > should (yank the disk, replace with a new one) on Linux -- there is no
> > need to do a bunch of hoopla like on FreeBSD.  On FreeBSD, with that
> > hoopla, also take the risk of inducing a kernel panic.  That risk does
> > not sit well with me, but thankfully I've only been in that situation
> > (replacing a bad disk + using hot-swapping) once -- and it did work.
> 
> Hrm, is this with software RAID or hardware RAID?

I do not use either, but have tried software RAID (Intel MatrixRAID) in
the past (and major, MAJOR bugs are why I do not any longer).  Speaking
(mostly) strictly of FreeBSD, let me list off the problems with both:

Software RAID:

1) Buggy as hell.  Using Intel MatrixRAID as an example, even with
   RAID 1, due to ata(4) driver bugs, you are practically guaranteed
   to lose your data,
3) Limited userland interface to RAID BIOS; many operations do not
   work with atacontrol, requiring a system reboot + entering BIOS
   to do things like add/remove disks or rebuild an array
3) SMART monitoring lost; if the card or BIOS supports passthrough
   (basically ATA version of pass(4)), FreeBSD will see the disks
   natively (e.g. arX for the RAID, ad4 and ad8 for the disks), and
   you can use smartmontools.  Otherwise, you're screwed
4) Support is questionable; numerous mainstream chips unsupported,
   including Adaptec HostRAID

Hardware RAID:

1) You are "locked in" to that controller.  Your data is at the
   mercy of the company who makes the HBA; if your controller dies
   and is no longer made, your data is dead in the water.  Chances
   are a newer model/revision of controller will not understand the
   the disk metadata from the previous controller
2) Performance problems as a result of excessive caching levels;
   onboard hardware cache vs. system memory cache vs. disk layer
   cache in OS vs. other kernel caching mechanisms
3) Controller firmware upgrades are risky -- 3Ware has a very nasty
   history of this, for sake of example.  I've heard of some upgrades
   changing the metadata format, requiring complete array re-creation
   
I can pull Ade Lovett <ade at freebsd.org> into this conversation if you
think any of the above is exaggerated.  :-)

The only hardware RAID controller I'd trust at this point would be
Areca -- but hardware RAID is not what I want.  On the other hand, I
really want Areca to make a standard 4 or 8-port SATA controller --
no RAID, but full driver support under arcmsr(4) (which uses CAM and
da(4)).  This would be perfect.

> With our hardware RAID systems, the process has always been the same, 
> regardless of which OS (Windows 2003 Servers, Debian Linux, FreeBSD) is 
> on the system:
>   - go into RAID management GUI, remove drive
>   - pull dead drive from system
>   - insert new drive into system
>   - go into RAID management GUI, make sure it picked up new drive and 
> started the rebuild

The simplicity there is correct -- that's really how simple it should
be.  But a GUI?  What card is this that requires a GUI?  Does it require
a reboot?  No command-line support?

> We've been lucky so far, and not had to do any drive replacements on our 
> non-ZFS software RAID systems (md on Debian, gmirror on FreeBSD).  I'm 
> not looking forward to a drive failing, as these systems have 
> non-hot-pluggable SATA setups.

I'm hearing you loud and clear.  :-)

> On the ZFS systems, we just "zpool offline" the drive, physically replace 
> the drive, and "zpool replace" the drive.  On one system, this was done 
> via hot-pluggable SATA backplane, on another, it required a reboot.

If this was done on the hardware RAID controller (presuming it uses
CAM and da(4)), I'm not surprised it worked perfectly.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-hackers mailing list