two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

Thu Jan 17 16:48:32 UTC 2013

There's also the subtle race condition in TX and RX handling that
re-queuing the taskqueue gets around.

Which is:

* The hardware is constantly receiving frames , right until you blow
the FIFO away by filling it up;
* The RX thread receives a bunch of frames;
* .. and processes them;
* .. once it's done processing, the hardware may have read some more
frames in the meantime;
* .. and the hardware may have generated a mitigated interrupt which
you're ignoring, since you're processing frames;
* So if your architecture isn't 100% paranoid, you may end up having
to wait for the next interrupt to handle what's currently in the
queue.

Now if things are done correct:

* The hardware generates a mitigated interrupt
* The mask register has that bit disabled, so you don't end up receiving it;
* You finish your RX queue processing, and there's more stuff that's
appeared in the FIFO (hence why the hardware has generated another
mitigated interrupt);
* You unmask the interrupt;
* .. and the hardware immediately sends you the MSI or signals an interrupt;
* .. thus you re-enter the RX processing thread almost(!) immediately.

However as the poster(s) have said, the interrupt mask/unmask in the
intel driver(s) may not be 100% correct, so you're going to end up
with situations where interrupts are missed.

The reason why this wasn't a big deal in the deep/distant past is
because we didn't used to have kernel preemption, or multiple kernel
threads running, or an overly aggressive scheduler trying to
parallelise things as much as possible. A lot of net80211/ath bugs
have popped out of the woodwork specifically because of the above
changes to the kernel. They were bugs before, but people didn't hit
them.

Adrian