NFS-exported ZFS instability

Wed Jan 2 17:40:54 UTC 2013

On Wed, Jan 02, 2013 at 08:24:39AM -0500, Rick Macklem wrote:
> Hiroki Sato wrote:
> > Hello,
> > 
> > I have been in a trouble about my NFS server for a long time. The
> > symptom is that it stops working in one or two weeks after a boot. I
> > could not track down the cause yet, but it is reproducible and only
> > occurred under a very high I/O load.
> > 
> > It did not panic, just stopped working---while it responded to ping,
> > userland programs seemed not working. I could break it into DDB and
> > get a kernel dump. The following URLs are a log of ps, trace, and
> > etc.:
> > 
> > http://people.allbsd.org/~hrs/FreeBSD/pool.log.20130102
> > http://people.allbsd.org/~hrs/FreeBSD/pool.dmesg.20130102
> > 
> > Does anyone see how to debug this? I guess this is due to a deadlock
> > somewhere. I have suffered from this problem for almost two years.
> > The above log is from stable/9 as of Dec 19, but this have persisted
> > since 8.X.
> > 
> Well, I took a quick glance at the log and there are a lot of processes
> sleeping on "pfault" (in vm_waitpfault() in sys/vm/vm_page.c). I'm no
> vm guy, so I'm not sure when/why that will happen. The comment on the
> function suggests they are waiting for free pages.
> 
> Maybe something as simple as running out of swap space or a problem
> talking to the disk(s) that has the swap partition(s) or ???
> (I'm talking through my hat here, because I'm not conversant with
>  the vm side of things.)
> 
> I might take a closer look this evening and see if I can spot anything
> in the log, rick
> ps: I hope Alan and Kostik don't mind being added to the cc list.

What I see in the log is that the lock cascade rooted in the thread
100838, which owns system map mutex. I believe this prevents malloc(9)
from making a progress in other threads, which e.g. own the ZFS vnode
locks. As the result, the whole system wedged.

Looking back at the thread 100838, we can see that it executes
smp_tlb_shootdown(). It is impossible to tell from the static dump,
is the appearance of the smp_tlb_shootdown() in the backtrace is
transient, or the thread is spinning there, waiting for other CPUs to
acknowledge the request. But, since the system wedged, most likely,
smp_tlb_shootdown spins.

Taking this hypothesis, the situation can occur, most likely, due to
some other core running with the interrupts disabled. Inspection of the
backtraces of the processes running on all cores does not show any which
could legitimately own a spinlock or otherwise run with the interrupts
disabled.

One thing you could try to do is to enable WITNESS for the spinlocks,
to try to catch the leaked spinlock. I very much doubt that this is
the case.

Another thing to try is to switch the CPU idle method to something
else. Look at the machdep.idle* sysctls. It could be some CPU errata
which blocks wakeup due the interrupt in some conditions in C1 ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130102/5f2f47d3/attachment.sig>