two problems in dev/e1000/if_lem.c::lem_handle_rxtx()

Wed Jan 16 14:19:09 UTC 2013

--- On Tue, 1/15/13, Luigi Rizzo <rizzo at iet.unipi.it> wrote:

> From: Luigi Rizzo <rizzo at iet.unipi.it>
> Subject: two problems in dev/e1000/if_lem.c::lem_handle_rxtx()
> To: head at freebsd.org, "freebsd-net at freebsd.org" <freebsd-net at freebsd.org>
> Cc: "Jack Vogel" <jfvogel at gmail.com>
> Date: Tuesday, January 15, 2013, 8:23 PM
> Hi,
> i found a couple of problems in
>        
> dev/e1000/if_lem.c::lem_handle_rxtx() ,
> (compare with dev/e1000/if_em.c::em_handle_que() for better
> understanding):
> 
> 1. in if_em.c::em_handle_que(), when em_rxeof() exceeds the
>   rx_process_limit, the task is rescheduled so it can
> complete the work.
>   Conversely, in if_lem.c::lem_handle_rxtx() the
> lem_rxeof() is
>   only run once, and if there are more pending packets
> the only
>   chance to drain them is to receive (many) more
> interrupts.
> 
>   This is a relatively serious problem, because the
> receiver has
>   a hard time recovering.
> 
>   I'd like to commit a fix to this same as it is done
> in e1000.
> 
> 2. in if_em.c::em_handle_que(), interrupts are reenabled
> unconditionally,
>    whereas lem_handle_rxtx() only enables
> them if IFF_DRV_RUNNING is set.
> 
>    I cannot really tell what is the correct
> way here, so I'd like
>    to put a comment there unless there is a
> good suggestion on
>    what to do.
> 
>    Accesses to the intr register are
> race-prone anyways
>    (disabled in fastintr, enabled in the rxtx
> task without
>    holding any lock, and generally accessed
> under EM_CORE_LOCK
>    in other places), and presumably
> enabling/disabling the
>    interrupts around activations of the taks
> is just an
>    optimization (and on a VM, it is actually
> a pessimization
>    due to the huge cost of VM exits).
> 
> cheers
> luigi

This is not really a big deal; this is how things works for a million 
years before we had task queues.

Intel controllers have built in interrupt moderation (unless you're on an
ISA bus or something), so interrupt storms aren't possible. Typical default
is 6K ints per second, so you can't get another interrupt for 1/6000th of
a second whether there's more work to do or not.

The "work" parameter should be an indicator that something is happening too
slow, which can happen with a shaper that's taking a lot more time than
normal to process packets. 

Systems should have a maximum pps engineered into its tuning depending on
the cpu to avoid live-lock on legacy systems. the default work limit of
100 is too low on a gigabit system. 

queuing tasks actually creates more overhead in the system, not less. The
issue is whether the process_limit * interrupt_moderation is set to a 
pps that's suitable for your system. 

Setting low work limits isn't really a good idea unless you have some other
time sensitive kernel task. Usually networking is a priority, so setting
arbitrary work limits makes less sense than queuing an additional task,
which defeats the purpose of interrupt moderation.

BC