ZFS w/failing drives - any equivalent of Solaris FMA?

Fri Sep 12 16:04:24 UTC 2008

On Fri, Sep 12, 2008 at 03:34:30PM +0100, Karl Pielorz wrote:
> --On 12 September 2008 06:21 -0700 Jeremy Chadwick <koitsu at FreeBSD.org>  
> wrote:
>
>> As far as I know, there is no such "standard" mechanism in FreeBSD.  If
>> the drive falls off the bus entirely (e.g. detached), I would hope ZFS
>> would notice that.  I can imagine it (might) also depend on if the disk
>> subsystem you're using is utilising CAM or not (e.g. disks should be daX
>> not adX); Scott Long might know if something like this is implemented in
>> CAM.  I'm fairly certain nothing like this is implemented in ata(4).
>
> For ATA, at the moment - I don't think it'll notice even if a drive  
> detaches. I think like my system the other day, it'll just keep issuing 
> I/O commands to the drive, even if it's disappeared (it might get much 
> 'quicker failures' if the device has 'gone' to the point of FreeBSD just 
> quickly returning 'fail' for every request).

I know ATA will notice a detached channel, because I myself have done
it: administratively, that is -- atacontrol detach ataX.  But the only
time that can happen "automatically" is if the actual controller does
so itself, or if FreeBSD is told to do it administratively.

What this does to other parts of the kernel and userland applications is
something I haven't tested.  I *can* tell you that there are major,
major problems with detach/reattach/reinit on ata(4) causing kernel
panics and other such things.  I've documented this quite thoroughly in
my "Common FreeBSD issues" wiki:

http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues

I am also very curious to know the exact brand/model of 8-port SATA
controller from Supermicro you are using, *especially* if it uses ata(4)
rather than CAM and da(4).  Such Supermicro controllers were recently
discussed on freebsd-stable (or was it -hardware?), and no one was able
to come to a concise decision as to whether or not they were decent or
even remotely trusted.  Supermicro provides a few different SATA HBAs.

>> Ideally, it would be the job of the controller and controller driver to
>> announce to underlying I/O operations fail/success.  Do you agree?
>>
>> I hope this "FMA Engine" on Solaris only *tells* underlying pieces of
>> I/O errors, rather than acting on them (e.g. automatically yanking the
>> disk off the bus for you).  I'm in no way shunning Solaris, I'm simply
>> saying such a mechanism could be as risky/deadly as it could be useful.
>
> Yeah, I guess so - I think the way it's meant to happen (and this is only 
> AFAIK) is that FMA 'detects' a failing drive by applying some 
> configurable policy to it. That policy would also include notifying ZFS, 
> so that ZFS could then decide to stop issuing I/O commands to that 
> device.

It sounds like that is done very differently than on FreeBSD.  If such a
condition happens on FreeBSD (disk errors scrolling by, etc.), the only
way I know of to get FreeBSD to stop sending commands through the ATA
subsystem is to detach the channel (atacontrol detach ataX).

> None of this seems to be in place, at least for ATA under FreeBSD - when 
> a drive goes bad, you can just end up with 'hours' worth of I/O timeouts, 
> until someone intervenes.

I can see the usefulness in Solaris's FMA thing.  My big concern is
whether or not FMA actually pulls the disk off the channel, or if it
just leaves the disk/channel connected and simply informs kernel pieces
not to use it.  If it pulls the disk off the channel, I have serious
qualms with it.

There are also chips on SATA and SCSI controllers which can cause chaos
as well -- specifically, SES/SES2 chips (I'm looking at you, QLogic).
These are supposed to be "smart chips" that detect when there are a
large number of transport or hardware errors (implying cabling issues,
etc.) and *automatically* yank the disk off the bus.  Sounds great on
paper, but in the field, I see these chips start pulling disks off the
bus, changing SCSI IDs on devices, or induce what appear to be full SCSI
subsystem timeouts (e.g. the SES/SES2 chip has locked up/crashed in some
way, and now your entire bus is dead in the water).  I have seen all of
the above bugs with onboard Adaptec 320 controllers, the systems running
Solaris 8, 9, and OpenSolaris.  Most times it turns out to be the
SES/SES2 chip getting in the way.

> I did enquire on the Open Solaris list about setting limits for 'errors' 
> in ZFS, which netted me a reply that it's FMA (at least in Solaris) 
> that's responsible for this - it just then informs ZFS of the condition. 
> We don't appear (again at least for ATA) to have anything similar for 
> FreeBSD yet :(

My recommendation to people these days is to avoid ata(4) on FreeBSD at
all costs if they expect to encounter disk or hardware failures.  The
ata(4) layer is in no way shape or form reliable in the case of
transport or disk failures, and even sometimes in the case of hot-
swapping.  Try your hardest to find a physical controller that supports
SATA disks and uses CAM/da(4), which WILL provide that reliability.  I
know Areca controllers do this, and Areca is very FreeBSD-friendly.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |