svn commit: r194909 - head/sys/dev/mxge

Sam Leffler sam at freebsd.org
Fri Jun 26 05:50:35 UTC 2009


Andrew Gallatin wrote:
> Sam Leffler wrote:
> 
>> There's something else wrong.  This is just covering up the real bug.
> 
> I'm pretty sure the "real bug" is in bpf, but I'm not sure its a bug,
> and I suspect there are probably other, similar, bugs lurking when
> you try to tear down a busy interface.

FWIW my point was that we need to stop adding "hacks" to drivers to
workaround issues like this.  There should be a standard way drivers are
written to handle detach that avoids races.

> 
> What I was doing was:
> 
> - point a packet generator offering 1.5Mpps at the NIC
> 
> - in a tight shell loop, do
> 
> while (1)
>     tcpdump -ni mxge0 host 172.31.0.1
> end
> 
> - in another shell loop:
> 
> while (1)
>     ifconfig mxge0 192.168.1.22 up
>     sleep 1
>     kldunload if_mxge
> end
> 
> Before the commit, with the old order:
> 
>        lock()
>        close()
>        unlock()
>        ether_ifdetach()
> 
> I'd see either an exhaustion of mbufs because tcpdump snuck in after
> I'd closed the device and re-opened it on me (so I never closed it
> again, resulting in leaked mbufs), or a panic.
> 
> I then moved the ether_ifdetach() to the new position:
>        ether_ifdetach()
>        lock()
>        close()
>        unlock()
> 
> This worked great until I started the packet generator,
> then it crashed.   The stack I saw (which I don't have
> saved, so this is from memory) when I had ether_ifdetach()
> first was:
> 
> 
> panic: mtx_lock() (don't remember exact text)
> bpf_mtap()
> ether_input()
> mxge_rx_done_small()
> mxge_clean_rx_done()
> mxge_intr()
> <...>
> 
> When I looked at the ifp in kgdb, I noticed that all the operations
> (if_input(), if_output(), etc) pointed to ifdead_*
> The machine I'm using for this is a MacPro, and I can't get ddb
> to work on the USB based console, so I'm working purely from dumps.
> I don't know how to get a stack of another process in kgdb on
> amd64, so that's all the information I have.
> 
> My assumption is that my interrupt thread was running when
> ether_ifdetach() called bpfdetach(), and was starting bpf_mtap()
> while bpfdetach() was destroying the bpf_if.  There doesn't
> seem to be anything to prevent bpfdetach() from racing with
> bpf_mtap().
> 
> By calling my close() routine (with a dying flag so nothing can
> sneak in before detach), I'm assured that my NIC is quiescent,
> and cannot be calling into the stack while the interface is being
> torn down.  I'd prefer to leave my commit as-is because:
> 
> 1) it works, and fixes a bug
> 2) it can be MFC'ed as is
> 3) it just feels wrong to be blasting packets up into the stack
>    while detaching.  With this NIC, the best way to make it
>    quiescent is to call close().  There's an interrupt handshake
>    done with the NIC to ensure its is quiescent, so doing something
>    like disabling its interrupt could leave the things in a weird state.

I'm not asking you to revert your commit; what I want is a generic
solution for all drivers so we can eliminate stuff like this.  I don't
care about MFC'ing at the moment; there have been issues with ifnet
detach for a long time but with recent changes to the ifnet layer in
HEAD I think it's time to work out issues like this.

bpf has several problems (at the top of my list is how it holds a
private mtx over ifpromisc calls which causes heartburn for drivers that
need to block when processing such requests). It sounds like bpf needs
to push tap'd packets through a method ifnet can "deaden" and/or bpf
needs to explicitly handle this case. At that point you can eliminate
your dying flag (as can some other drivers that I've seen recently grow
similar hacks).

	Sam


More information about the svn-src-head mailing list