ZFS hang

Fabian Keil freebsd-listen at fabiankeil.de
Fri Dec 7 16:22:48 UTC 2012


Matt Burke <mattblists at icritical.com> wrote:

> Obviously, the cause of my problems would seem to be a hosed disk. However
> the kernel msgbuf shows no complaints from the drive before reboot.
> 
> da8 is a 60GB OCZ Agility 3 SSD (purchased prior to realising just how
> unreliable they are). According to the SMART data, it's had just 146GB of
> reads and 278GB writes over 3 power cycles with only 3 months power on
> time, similar to the others that have failed (~60% failure rate for ours)
> 
> I can understand the drive failing, I just can't understand how it hung the
> system. I have had a similar thing happen on one of these machines before
> (with GENERIC and no dumpdev, so no debugging) with one of these disks on
> an Areca HBA.

In CURRENT, parts of the cam layer can silently hang under certain
circumstances and this can negatively affect various other subsystems
including ZFS:
http://lists.freebsd.org/pipermail/freebsd-current/2012-October/037413.html

I suppose this regression is old enough to have trickled down
to the stable branches by now.

I'm not saying that this is definitively the problem you are
seeing, but I think it would explain the symptoms.

> Could there be a problem with ATA devices on SCSI controllers which is
> causing failures to be silently dropped? Is ZFS lacking a timeout on IO calls?

I believe ZFS is designed with the expectation that timeouts are
handled by the layers below it, so technically it doesn't "lack"
the timeouts for IO calls ...

Fabian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20121207/17ee7233/attachment.sig>


More information about the freebsd-fs mailing list