[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Nov 16 00:42:05 UTC 2015
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340
Rick Macklem <rmacklem at FreeBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|New |In Progress
CC| |rmacklem at FreeBSD.org
Assignee|freebsd-bugs at FreeBSD.org |rmacklem at FreeBSD.org
--- Comment #2 from Rick Macklem <rmacklem at FreeBSD.org> ---
Created attachment 163160
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163160&action=edit
patch that might fix this problem
I think this crash might have been caused by a race
between svcpool_destroy() and the socket upcall.
The code in svcpool_destroy() assumes that SVC_RELEASE(xprt)
drops the ref cnt to 0, so that SVC_DESTROY() is called.
-->SVC_DESTROY() shuts down the socket upcall.
--> If the ref cnt doesn't go to 0, svcpool_destroy() will
mtx_destroy() the mutexes prematurely.
I am not sure, but the race might have been introduced by
r267228 since, prior to this there was a single mutex for
the pool, held while all xprt's are unregistered.
After r267228, there is a group of mutexes, where the code
only held one at a time, so I think an xprt might get re-registered
on another group after that group has had all de-registered.
The attached little patch moves the mtx_lock() calls to a
separate loop before the xprt_unregister loops, so that all
locks are held while all are de-registered.
I've added mav@ to the cc list, since he might be the guy
that actually understands this.
Anyhow, if you could test the attached patch with msi interrupts
re-enabled and see if the crashes go away, that would be great.
(I don't think that this indicates that the em(4) driver is broken.
I suspect that it just affects timing of the interrupts that tripped
over this race.)
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-net
mailing list