Interrupts + Polling mode (similar to Linux's NAPI)

Fri Apr 24 15:03:54 UTC 2009

--- On Thu, 4/23/09, Ed Maste <emaste at freebsd.org> wrote:

> From: Ed Maste <emaste at freebsd.org>
> Subject: Re: Interrupts + Polling mode (similar to Linux's NAPI)
> To: "Andrew Brampton" <brampton+freebsd-net at gmail.com>
> Cc: attilio at freebsd.org, freebsd-net at freebsd.org, "Luigi Rizzo" <rizzo at iet.unipi.it>
> Date: Thursday, April 23, 2009, 3:04 PM
> On Fri, Mar 27, 2009 at 11:05:00AM +0000, Andrew Brampton
> wrote:
> 
> > 2009/3/27 Luigi Rizzo <rizzo at iet.unipi.it>:
> > > The load of polling is pretty low (within 1% or
> so) even with
> > > polling. The advantage of having interrupts is
> faster response
> > > to incoming traffic, not CPU load.
> > 
> > oh, I was under the impression that polling spun in a
> tight loop, thus
> > using 100% of the processor. After a quick test I see
> this is not the
> > case. I assume it will get to 100% CPU load if I
> saturate my network.
> 
> Yes, polling has a limit on the maximum CPU time it will
> use, and also
> will use less than the limit if there is no traffic.
> 
> There are a number of sysctls under kern.polling that
> control its
> behaviour:
> 
> * kern.polling.user_frac: Desired user fraction of cpu time
> 
> This attempts to reserve at least a specified percentage of
> available
> CPU time for user processes; polling tries to limit its
> percentage use
> to 100 less this value.
> 
> * kern.polling.burst: Current polling burst size
> * kern.polling.burst_max: Max Polling burst size
> * kern.polling.each_burst: Max size of each burst
> 
> These three control the number of packets that polling
> processes per
> call / tick.  Packets are processed in batches of
> each_burst, up to
> burst packets total per tick.  The value of burst is capped
> at
> busrt_max.
> 
> In order to keep the user_frac CPU percentage available for
> non-polling,
> a feedback loop is used that controls the value of burst. 
> Each time a
> bach of packets is processed, burst is incremented or
> decremented by 1,
> depending on how much CPU time polling actually used.  In
> addition, if
> polling extends beyond the next tick it's scaled back
> to 7/8ths of the
> current value.
> 
> Polling was originally implemented as a livelock-avoidance
> technique
> for the single-core case -- the primary goal is to
> guarantee the
> availability of CPU time specified in user_frac.  The
> current algorithm
> does not behave that well if user_frac is set low.  Setting
> it low is
> reasonable if the workload is largely in-kernel (i.e.,
> bridging or
> routing), or when running SMP.
> 
> Another downside of the current implementation is that
> interfaces will
> be polled multiple times per tick (burst / each_burst
> times), even if
> there are no packets to process.
> 
> At work we've developed a replacement polling algorithm
> that keeps track
> of the actual amount of time spent per packet, and uses
> that as the
> feedback to control the number of packets in each batch.
> 
> This work requires a change to the polling KPI: the polling
> handlers
> have to return the count of packets actually handled.  My
> hope is to get
> the KPI change committed in time for 8.0, even if we
> don't switch the
> algorithm yet.  Attilio (on CC:) and I will make the patch
> set for the
> KPI change available shortly for comment.
> 
> 
> -Ed

Actually, the "advantage of using interrupts" is to have a per
NIC control without having all of the extra code to implement
polling. Using variable interrupt moderation is much more desirable
and efficient, so polling is only useful for legacy NICs with no
controls on interrupt delays.

Polling requires that you adulterate the system with the polling function,
that you call routines when there is nothing to process, and uses
many cpu cycles doing unnecessary things.

What happens when you have 4 NICs with different levels of traffic? 

You'd be better off launching a thread and polling yourself than 
having a system-wide function with generalized settings.

Barney