IPFW update frequency

Max Laier max at love2party.net
Sat Mar 31 10:47:48 UTC 2007

On Saturday 31 March 2007 11:27, Luigi Rizzo wrote:
> On Sat, Mar 31, 2007 at 10:21:02AM +0200, Andre Oppermann wrote:
> > Julian Elischer wrote:
> > > Luigi Rizzo wrote:
> > >> On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote:
> > >>> I have been looking at the IPFW code recently, especially with
> > >>> respect to locking.
> > >>> There are some things that could be done to improve IPFW's
> > >>> behaviour
> ...
> > The locking overhead per packet in ipfw is by no means its limiting
> i think you and Julian are looking at different issues.
> if i understand julian's comment, the problem is that the list
> is protected by a single lock, so no hope of parallelising

ipfw uses rwlocks for the static rules quite some time now.  In contrast 
to Julian, I don't believe that the claimed lock order reversal with a 
rlock() can be the cause of a deadlock (exclusiveness is a precondition).  
Haveing been involved in the hacks that went in and out of ipfw and pfil 
locking over the last few years and the problems that went along with it, 
I'd urge everybody to *not* rush any more hacks into this.

> the work, and if one kernel thread is busy processing a packet
> in the filter, others might be blocked for a long time
> (in your case, the set of 30 rules is 765ns for ipfw and 1198ns
> for pf).
> Your tests presumably have little if any contention on the lock.

Most likely none at all, since the forwarding path takes care of 

> Specifically, if you compute the difference of the inverses
> of those pps rates you see the following:
> 	+pfil_pass	45.3 ns	per packet
> 	+ipfw_allow	+253.4 ns/packet (setup and first rule)
> 	+ipfw_30	+17.67 ns/(packet * extra rule)
> 	+pf_pass	+376.9 ns/packet (setup and first rule)
> 	+pf_30		+28.34 ns/(packet * extra rule)
> the lock acquisition cost is in the 'setup' part but i cannot tell
> how expensive it is.
> Julian's suggested change (and surely the one i described)
> replaces the lock/unlock pair on the rule list with a refcount add/dec
> pair (with uncontested locks the cost should be similar), but
> especially makes the operation non-blocking allowing running the input
> and output paths in parallel.

See above, ipfw is working in parallel already.  In addition to that, 
using a ref-count would be worse!  Instead of two atomic operations you'd 
then have to pay for four: lock ref unlock work lock unref unlock  All of 
which can contentend each other.  This will most likely cause more 
serialization than we currently have.  Again, please don't rush any 

> > factor.  Actually it's a very small part and pretty much any work on
> > it is lost love.  It would be much better spent time to optimize the
> > main rule loop of ipfw to speed things up.  I was profiling ipfw
> > early last year with an Agilent packet generator and hwpmc.  In the
> > meantime the packet forwarding path (w/o ipfw) has been improved but
> > relative to each other the number are still correct.
> actually your numbers show that at least the rule setup (and the
> processing of simple rules) is significantly faster (50% or so) in
> ipfw2 than in pf.

Note that pf includes a plethora of sanity checks in the default rule 
processing.  Also note that pf - due to it's stateful design - does 
a "check state" first for every packet.  This gives a big mallus in this 
test special test.

> I know that the setup time is expensive, but i am not sure that
> one can save much - in both cases, you need to fetching a lot
> of information, which is scattered in variable locations in
> the mbuf and packet headers.

Agreed.  For the ipfw case it *might* make sense to reach into the upper 
layers only if requested - not at all sure about that, however.

