ZFS hang

Garrett Cooper yanegomi at gmail.com
Fri Dec 7 18:21:05 UTC 2012


On Dec 7, 2012, at 8:22 AM, Fabian Keil <freebsd-listen at fabiankeil.de> wrote:

> Matt Burke <mattblists at icritical.com> wrote:
> 
>> Obviously, the cause of my problems would seem to be a hosed disk. However
>> the kernel msgbuf shows no complaints from the drive before reboot.
>> 
>> da8 is a 60GB OCZ Agility 3 SSD (purchased prior to realising just how
>> unreliable they are). According to the SMART data, it's had just 146GB of
>> reads and 278GB writes over 3 power cycles with only 3 months power on
>> time, similar to the others that have failed (~60% failure rate for ours)
>> 
>> I can understand the drive failing, I just can't understand how it hung the
>> system. I have had a similar thing happen on one of these machines before
>> (with GENERIC and no dumpdev, so no debugging) with one of these disks on
>> an Areca HBA.
> 
> In CURRENT, parts of the cam layer can silently hang under certain
> circumstances and this can negatively affect various other subsystems
> including ZFS:
> http://lists.freebsd.org/pipermail/freebsd-current/2012-October/037413.html
> 
> I suppose this regression is old enough to have trickled down
> to the stable branches by now.
> 
> I'm not saying that this is definitively the problem you are
> seeing, but I think it would explain the symptoms.
> 
>> Could there be a problem with ATA devices on SCSI controllers which is
>> causing failures to be silently dropped? Is ZFS lacking a timeout on IO calls?
> 
> I believe ZFS is designed with the expectation that timeouts are
> handled by the layers below it, so technically it doesn't "lack"
> the timeouts for IO calls ...

I've noticed hangs on reboot as well recently (in the last 2-3 months) with my ata single disk pools and my mfi pool. All storage disks seem healthy... The pools were running v28 with the zfs features upgrade.

Thanks,
-Garrett


More information about the freebsd-fs mailing list