two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

Fri Jan 18 14:50:11 UTC 2013

> the problem i was actually seeing are slightly different,
> namely:
> - once the driver lags behind, it does not have a chance to
> recover
>   even if there are CPU cycles available, because both
> interrupt 
>   rate and packets per interrupt are capped.
> - much worse, once the input stream stops, you have a huge
> backlog that
>   is not drained. And if, say, you try to ping the
> machine,
>   the incoming packet is behind another 3900 packets,
> so the first
>   interrupt drains 100 (but not the ping request, so no
> response),
>   you keep going for a while, eventually the external
> world sees the
>   machine as not responding and stops even trying to
> talk to it.

This is a silly example. As I said before, the 100 work limit is 
arbitrary and too low for a busy network. If you have a backlog of
3900 packets with a workload of 100, then your system is so incompetently
tuned that it's not even worthy of discussion.

If you're using workload and task queues because you don't know how to 
tune moderation the process_limit, that's one discussion. But if you can't
 process all of the packets in your RX queue in the interrupt window than
 you either need to tune your machine better or get a faster machine.

When you tune the work limit you're making a decision about the trade off between livelock and dropping packets. It's not an arbitrary decision.

BC