two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

Adrian Chadd adrian at freebsd.org
Thu Jan 17 16:48:32 UTC 2013


There's also the subtle race condition in TX and RX handling that
re-queuing the taskqueue gets around.

Which is:

* The hardware is constantly receiving frames , right until you blow
the FIFO away by filling it up;
* The RX thread receives a bunch of frames;
* .. and processes them;
* .. once it's done processing, the hardware may have read some more
frames in the meantime;
* .. and the hardware may have generated a mitigated interrupt which
you're ignoring, since you're processing frames;
* So if your architecture isn't 100% paranoid, you may end up having
to wait for the next interrupt to handle what's currently in the
queue.

Now if things are done correct:

* The hardware generates a mitigated interrupt
* The mask register has that bit disabled, so you don't end up receiving it;
* You finish your RX queue processing, and there's more stuff that's
appeared in the FIFO (hence why the hardware has generated another
mitigated interrupt);
* You unmask the interrupt;
* .. and the hardware immediately sends you the MSI or signals an interrupt;
* .. thus you re-enter the RX processing thread almost(!) immediately.

However as the poster(s) have said, the interrupt mask/unmask in the
intel driver(s) may not be 100% correct, so you're going to end up
with situations where interrupts are missed.

The reason why this wasn't a big deal in the deep/distant past is
because we didn't used to have kernel preemption, or multiple kernel
threads running, or an overly aggressive scheduler trying to
parallelise things as much as possible. A lot of net80211/ath bugs
have popped out of the woodwork specifically because of the above
changes to the kernel. They were bugs before, but people didn't hit
them.



Adrian


More information about the freebsd-net mailing list