NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE

Andriy Gapon avg at FreeBSD.org
Thu Dec 13 10:36:58 UTC 2012


I decided to share here the comment that I made in private, so that more people
could potentially benefit from it.

on 03/12/2012 20:41 olivier olivier said the following:
> Hi all
> After upgrading from 9.0-RELEASE to 9.1-PRERELEASE #0 r243679 I'm having
> severe problems with NFS sharing of a ZFS volume. nfsd appears to hang at
> random times (between once every couple hours to once every two days) while
> accessing a ZFS volume, and the only way I have found of resolving the
> problem is to reboot. The server console is sometimes still responsive
> during the nfsd hang, and I can read and write files to the same ZFS volume
> while nfsd is hung. I am pasting below the output of procstat -kk on nfsd,
> and details of my pool (nfsstat on the server gets hung when the problem
> has started occurring, and does not produce any output). The pool is v28
> and was created from a bunch of volumes attached over Fibre Channel using
> the mpt driver. My system has a Supermicro board and 4 AMD Opteron 6274
> CPUs.
> 
> I did not experience any nfsd hangs with 9.0-RELEASE (same machine,
> essentially same configuration, same usage pattern).
> 
> I would greatly appreciate any help to resolve this problem!


I've looked at the provided data and I do not see anything that implicates ZFS.
My rules of the thumb for ZFS hangs:
- if there are threads in zio_wait
- if you can firm that they are indeed stuck there[*]
- if there are no threads in zio_interrupt

[*] you have to be sure that a thread just sits in zio_wait and doesn't make any
forward progress as opposed to the thread doing a lot of I/O and thus having a
high probability of being seen in zio_wait.

Then it is most likely that the problem is at the storage level.
Most likely it is a bug in storage controller driver which allowed an I/O request
to get lost (instead of "errored out" or timed out).

`camcontrol tags <disk> -v` can be used to query depth of a queue for each disk
and determine the bad one.

-- 
Andriy Gapon


More information about the freebsd-fs mailing list