IPFW update frequency

Sat Mar 31 16:04:03 UTC 2007

Thanks for the information..
The main thrust for me is to make it not hold any locks during processing.

performance is 2nd

Andre Oppermann wrote:
> Julian Elischer wrote:
>> Luigi Rizzo wrote:
>>> On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote:
>>>> I have been looking at the IPFW code recently, especially with 
>>>> respect to locking.
>>>> There are some things that could be done to improve IPFW's behaviour 
>>>> when processing packets, but some of these take a
>>>> toll (there is always a toll) on the 'updating' side of things.
>>>
>>> certainly ipfw was not designed with SMP in mind. If you can tell us 
>>> what is your plan to make the list lock free
>>> (which one, the static or dynamic ones ?) maybe we can comment more.
>>>
>>> E.g. one option could be the usual trick of adding refcounts to
>>> the individual rules, and then using an array of pointers to them.
>>> While processing you grab a refcount to the array, and release it once
>>> done with the packet. If there is an addition or removal, you duplicate
>>> the array (which may be expensive for the large 20k rules mentioned),
>>> manipulate the copy and then atomically swap the pointers to the head.
>>
>> This is pretty close.. I know I've mentioned this to people several 
>> times over
>> the last year or so. the trick is to try do it in a way that the 
>> average packet
>> doesn't need to do any locks to get in and the updater does more work.
>> if you are willing to acquire a lock on both starting and ending
>> the run through the firewall it is easy.
>> (I already have code to do that..)
>> (see http://www.freebsd.org/~julian/atomic_replace.c (untested but
>> probably close.)
>> doing it without requiring that each packet get those locks however is 
>> a whole new level of problem.
> 
> The locking overhead per packet in ipfw is by no means its limiting
> factor.  Actually it's a very small part and pretty much any work on
> it is lost love.  It would be much better spent time to optimize the
> main rule loop of ipfw to speed things up.  I was profiling ipfw early
> last year with an Agilent packet generator and hwpmc.  In the meantime
> the packet forwarding path (w/o ipfw) has been improved but relative
> to each other the number are still correct.
> 
> Numbers pre-taskqueue improvements from early 2006:
>  fastfwd        580357 pps
>  fastfwd+pfil_pass    565477 pps  (no rules, just pass packet on)
>  fastfwd+ipfw_allow    505952 pps  (one rule)
>  fastfwd+ipfw_30rules    401768 pps  (30 IP address non-matching rules)
>  fastfwd+pf_pass    476190 pps  (one rule)
>  fastfwd+pf_30rules    342262 pps  (30 IP address non-matching rules)
> 
> The overhead per packet is big.  Enabling of ipfw and the pfil/ipfw
> per packet and their indirect function calls cause a loss of only
> about 15'000 pps (0.9%).  On the other hand the first rule costs 12.9%
> and each additional rule 0.6%.  All this is without any complex rules
> like table lookups, state tracking, etc.
> 
>         idle        fastfwd    fastfwd+ipfw_allow fastfwd+ipfw_30rules
> cycles        2596685731    2598214743    2597973265    2596702381
> cpu-clk-unhalted 7824023    2582240847    2518187670    2483904362
> instructions    2317535        1324655330    1492363346    2026009148
> branches    316786        174329367    191263118    294700024
> branch-mispredicts 19757    2235749        10003461    8848407
> dc-access    1417532        829159482    998427224    1235192770
> dc-refill-from-l2 2124        4767395        4346738        4548311
> dc-refill-from-system 89    803102        819658        654661
> dtlb-l2-hit    626        10435843    9304448        12352018
> dtlb-miss    129        255493        130998        112644
> ic-fetch    804423        471138619    583149432    870371492
> ic-miss        2358        34831        2505198        1947943
> itlb-l2-hit    0        74        12        12
> itlb-miss    42        92        82        82
> lock-cycles    77        803        352        451
> locked-instructions 4        19        2        4
> lock-dc-access    6        20        6        7
> lock-dc-miss    0        0        0        0
> 
> Hardware is a dual Opteron 852 at 2.6GHz on a Tyan 2882 mainboard with
> a dual Intel em network card plugged into a PCI64-133 slot.  Packets
> are flowing from em0 -> em1.
>