[Bug 204340] [panic] nfsd, em, msix, fatal trap 9

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Nov 16 00:42:05 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340

Rick Macklem <rmacklem at FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|New                         |In Progress
                 CC|                            |rmacklem at FreeBSD.org
           Assignee|freebsd-bugs at FreeBSD.org    |rmacklem at FreeBSD.org

--- Comment #2 from Rick Macklem <rmacklem at FreeBSD.org> ---
Created attachment 163160
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163160&action=edit
patch that might fix this problem

I think this crash might have been caused by a race
between svcpool_destroy() and the socket upcall.
The code in svcpool_destroy() assumes that SVC_RELEASE(xprt)
drops the ref cnt to 0, so that SVC_DESTROY() is called.
-->SVC_DESTROY() shuts down the socket upcall.
--> If the ref cnt doesn't go to 0, svcpool_destroy() will
    mtx_destroy() the mutexes prematurely.

I am not sure, but the race might have been introduced by
r267228 since, prior to this there was a single mutex for
the pool, held while all xprt's are unregistered.
After r267228, there is a group of mutexes, where the code
only held one at a time, so I think an xprt might get re-registered
on another group after that group has had all de-registered.

The attached little patch moves the mtx_lock() calls to a
separate loop before the xprt_unregister loops, so that all
locks are held while all are de-registered.

I've added mav@ to the cc list, since he might be the guy
that actually understands this.

Anyhow, if you could test the attached patch with msi interrupts
re-enabled and see if the crashes go away, that would be great.
(I don't think that this indicates that the em(4) driver is broken.
 I suspect that it just affects timing of the interrupts that tripped
 over this race.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the freebsd-net mailing list