ZFS w/failing drives - any equivalent of Solaris FMA?
zbeeble at gmail.com
Fri Sep 12 16:16:10 UTC 2008
On Fri, Sep 12, 2008 at 10:34 AM, Karl Pielorz <kpielorz_lst at tdx.co.uk>wrote:
> --On 12 September 2008 06:21 -0700 Jeremy Chadwick <koitsu at FreeBSD.org>
> As far as I know, there is no such "standard" mechanism in FreeBSD. If
>> the drive falls off the bus entirely (e.g. detached), I would hope ZFS
>> would notice that. I can imagine it (might) also depend on if the disk
>> subsystem you're using is utilising CAM or not (e.g. disks should be daX
>> not adX); Scott Long might know if something like this is implemented in
>> CAM. I'm fairly certain nothing like this is implemented in ata(4).
> For ATA, at the moment - I don't think it'll notice even if a drive
> detaches. I think like my system the other day, it'll just keep issuing I/O
> commands to the drive, even if it's disappeared (it might get much 'quicker
> failures' if the device has 'gone' to the point of FreeBSD just quickly
> returning 'fail' for every request).
Since I had the opportunity, I tested this recently for both CAM and ATA.
Now the RAID engine was gmirror in both cases (my production hardware
doesn't do ZFS yet), but I expect the reaction to be somewhat the same.
Both systems were Dell 1U's. One, an R200, had SATA disks attached to a
plain SATA controller. I believe it may have supported RAID1, but I didn't
use that functionality. When a drive was removed from it, it stalled for
some time (30 minutes?) and then resumed working. by the time I could type
on the machine again, gmirror had decided that the drive was gone and marked
the mirror as degraded.
The other system was a 1950-III with a SCSI SAS controller attached to an
SAS hot-swap backplane. The drives themselves were 750G SATA drives.
Yanking one of them resulted in about 5 seconds of disruption followed by
gmirror realizing the problem and marking the mirror degraded.
Neither system was heavily loaded during the test.
More information about the freebsd-hackers