[Bug 204340] [panic] nfsd, em, msix, fatal trap 9

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Nov 18 22:44:59 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340

--- Comment #10 from Rick Macklem <rmacklem at FreeBSD.org> ---
I have just added 2 more patches that might be relevant to the crashes.
When the nfsd threads are terminated, this is what is supposed to happen:
- All nfsd threads running in svc_run_internal() return to svc_run().
- svc_run() waits for all these threads to return.
- After svc_run returns, the nfsd calls svcpool_destroy().
- svcpool_destroy() unregisters all the xprts (which represent the TCP sockets)
  - at this point, the reference count should be 1 for all xprts
  --> Then svcpool_destroy() calls SVC_RELEASE(xprt) for all of them, which
      drops the reference count to 0 and calls SVC_DESTROY()
     --> This actually calls svc_vc_destroy(), which shuts down the socket
upcall
         and after that, destroys the mutexes.

My best guess w.r.t. the crashes is that the reference count gets messed up on
an xprt, so it doesn't get SVC_DESTROY()'d. Then a socket upcall calls
xprt_active() after the mutex has been destroyed and BOOM.

The two patched should be applied along with the first one.
The second patch fixes the one other place that I can spot where the server
side krpc code isn't quite SMP safe. Although unlikely, it is conceivable
that this could cause the crashes.

The third patch makes sure that the backchannel xprt is dereferenced before
the call to svcpool_destroy(). The one seems a more likely culprit, but only
if you have clients doing NFSv4.1 mounts against the server.

If you could try the second patch (and the third if you have NFSv4.1 mounts),
that would be appreciated.

One final comment: I am assuming that you are terminating the nfsd threads
by sending a SIGUSR1 to the nfsd master. This is the only way the nfsd
threads should be terminated. (If you are using /etc/rc.d/nfsd, it should
be doing that, but you might try using "kill -USR1 <pid-of-nfsd-master>"
directly, just in case the shell script is busted.

This pretty well exhausts what I can see that might cause the crashes and
I can't reproduce a crash here, so hopefully you can make some progress
from here.

Good luck with it, rick

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the freebsd-bugs mailing list