Interrupts + Polling mode (similar to Linux's NAPI)

Thu Apr 23 20:45:49 UTC 2009

2009/4/23 Ed Maste <emaste at freebsd.org>:
> On Fri, Mar 27, 2009 at 11:05:00AM +0000, Andrew Brampton wrote:
>
>> 2009/3/27 Luigi Rizzo <rizzo at iet.unipi.it>:
>> > The load of polling is pretty low (within 1% or so) even with
>> > polling. The advantage of having interrupts is faster response
>> > to incoming traffic, not CPU load.
>>
>> oh, I was under the impression that polling spun in a tight loop, thus
>> using 100% of the processor. After a quick test I see this is not the
>> case. I assume it will get to 100% CPU load if I saturate my network.
>
> Yes, polling has a limit on the maximum CPU time it will use, and also
> will use less than the limit if there is no traffic.
>
> There are a number of sysctls under kern.polling that control its
> behaviour:
>
> * kern.polling.user_frac: Desired user fraction of cpu time
>
> This attempts to reserve at least a specified percentage of available
> CPU time for user processes; polling tries to limit its percentage use
> to 100 less this value.
>
> * kern.polling.burst: Current polling burst size
> * kern.polling.burst_max: Max Polling burst size
> * kern.polling.each_burst: Max size of each burst
>
> These three control the number of packets that polling processes per
> call / tick.  Packets are processed in batches of each_burst, up to
> burst packets total per tick.  The value of burst is capped at
> busrt_max.
>
> In order to keep the user_frac CPU percentage available for non-polling,
> a feedback loop is used that controls the value of burst.  Each time a
> bach of packets is processed, burst is incremented or decremented by 1,
> depending on how much CPU time polling actually used.  In addition, if
> polling extends beyond the next tick it's scaled back to 7/8ths of the
> current value.
>
> Polling was originally implemented as a livelock-avoidance technique
> for the single-core case -- the primary goal is to guarantee the
> availability of CPU time specified in user_frac.  The current algorithm
> does not behave that well if user_frac is set low.  Setting it low is
> reasonable if the workload is largely in-kernel (i.e., bridging or
> routing), or when running SMP.
>
> Another downside of the current implementation is that interfaces will
> be polled multiple times per tick (burst / each_burst times), even if
> there are no packets to process.
>
> At work we've developed a replacement polling algorithm that keeps track
> of the actual amount of time spent per packet, and uses that as the
> feedback to control the number of packets in each batch.
>
> This work requires a change to the polling KPI: the polling handlers
> have to return the count of packets actually handled.  My hope is to get
> the KPI change committed in time for 8.0, even if we don't switch the
> algorithm yet.  Attilio (on CC:) and I will make the patch set for the
> KPI change available shortly for comment.

This is the KPI breakage patch:
http://people.freebsd.org/~attilio/Sandvine/polling/polling_kpi.diff

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein