svn commit: r194909 - head/sys/dev/mxge

Andrew Gallatin gallatin at cs.duke.edu
Thu Jun 25 01:43:17 UTC 2009


Sam Leffler wrote:

 > There's something else wrong.  This is just covering up the real bug.

I'm pretty sure the "real bug" is in bpf, but I'm not sure its a bug,
and I suspect there are probably other, similar, bugs lurking when
you try to tear down a busy interface.

What I was doing was:

- point a packet generator offering 1.5Mpps at the NIC

- in a tight shell loop, do

while (1)
	tcpdump -ni mxge0 host 172.31.0.1
end

- in another shell loop:

while (1)
	ifconfig mxge0 192.168.1.22 up
	sleep 1
	kldunload if_mxge
end

Before the commit, with the old order:

        lock()
        close()
        unlock()
        ether_ifdetach()

I'd see either an exhaustion of mbufs because tcpdump snuck in after
I'd closed the device and re-opened it on me (so I never closed it
again, resulting in leaked mbufs), or a panic.

I then moved the ether_ifdetach() to the new position:
        ether_ifdetach()
        lock()
        close()
        unlock()

This worked great until I started the packet generator,
then it crashed.   The stack I saw (which I don't have
saved, so this is from memory) when I had ether_ifdetach()
first was:


panic: mtx_lock() (don't remember exact text)
bpf_mtap()
ether_input()
mxge_rx_done_small()
mxge_clean_rx_done()
mxge_intr()
<...>

When I looked at the ifp in kgdb, I noticed that all the operations
(if_input(), if_output(), etc) pointed to ifdead_*
The machine I'm using for this is a MacPro, and I can't get ddb
to work on the USB based console, so I'm working purely from dumps.
I don't know how to get a stack of another process in kgdb on
amd64, so that's all the information I have.

My assumption is that my interrupt thread was running when
ether_ifdetach() called bpfdetach(), and was starting bpf_mtap()
while bpfdetach() was destroying the bpf_if.  There doesn't
seem to be anything to prevent bpfdetach() from racing with
bpf_mtap().

By calling my close() routine (with a dying flag so nothing can
sneak in before detach), I'm assured that my NIC is quiescent,
and cannot be calling into the stack while the interface is being
torn down.  I'd prefer to leave my commit as-is because:

1) it works, and fixes a bug
2) it can be MFC'ed as is
3) it just feels wrong to be blasting packets up into the stack
    while detaching.  With this NIC, the best way to make it
    quiescent is to call close().  There's an interrupt handshake
    done with the NIC to ensure its is quiescent, so doing something
    like disabling its interrupt could leave the things in a weird state.

Drew


More information about the svn-src-head mailing list