svn commit: r191259 - head/sys/netinet

Sun Apr 19 10:44:50 UTC 2009

Robert Watson wrote:
> On Sun, 19 Apr 2009, Kip Macy wrote:
> 
>> Author: kmacy
>> Date: Sun Apr 19 04:44:05 2009
>> New Revision: 191259
>> URL: http://svn.freebsd.org/changeset/base/191259
>>
>> Log:
>>  - Allocate a small flowtable in ip_input.c (changeable by tuneable)
>>  - Use for accelerating ip_output
> 
> If you anticipate the flowtable being used with many types, I wonder if 
> the flowtable sysctl to enable/disable it by policy should be on the 
> consumer side, rather than the producer side?  That way you could say 
> "use the flowtable for ipv4 and ipv6 but not ipx", which might well be 
> helpful for debugging when adding flowtable support for those 
> protocols.  Also, is it the case that when the flowtable is disabled, it 
> isn't allocated, or is the basic table always allocated regardless of 
> policy?

I have another question on the flowtable:  What is the pupose of it?
All router vendors have learned a long time ago that route caching
(aka flow caching) doesn't work out on a router that carries the DFZ
(default free zone, currently ~280k prefixes).  The overhead of managing
the flow table and the high churn rate make it much more expensive than
a direct and already very efficient radix trie lookup. Additionally a
well connected DFZ router has some 1k prefix updates per second.  More
information can be found for example at Cisco here:
  http://www.cisco.com/en/US/tech/tk827/tk831/technologies_white_paper09186a00800a62d9.shtml
The same findings are also available from all other major router vendors
like Juniper, Foundry, etc.

Lets examine the situations:
  a) internal router with only a few routes; The routing and ARP table
     are small, lookups are very fast and everything is hot in the CPU
     caches anyway.
  b) DFZ router with 280k routes; A small flow table has constant thrashing
     becoming negative overhead only.  A large flow table has a high maintenance
     overhead, higher lookup times and sill a significant amount of thrashing.
     The overhead of the flow table is equal or higher than a direct routing
     table lookup.
Concluding that a flow table is never a win but a liability in any realistic setting.

Now I don't have benchmark numbers to back up the theory I put forth here.
However I want to bring up the rationale for why nobody else is doing it.
A statistical analysis easily shows that flow caching has only a few small
spots where it may offer some advantage over direct routing table lookups;
none of them are where it matter in real work situations.

As our kernel currently stands an advantage of the flow table can certainly
be demonstrated for a small routing table and a small number of flows.  This
is due to a very sub-optimal routing table implementation we have.  The flow
table approach short-cuts a significant number of locking operations (routing
table, routing entries, ARP table and possibly some more).  On the other hand
this caching of flows and pointers to routing entries and ARP entries complicates
updates to these tables and potentially makes them very expensive.  Additionally
is creates a "tangled mess" again complicating future changes and advances in
those areas (unless the flow table were simply removed again at that point).

I argue that instead of cludging around (the flow table) a sub-optimal part
of the network stack (the current incarnation of the routing table) time could
be equally spent wiser on fixing the problems in the first place.  I've outlined
a few approaches a couple of times before on the mailing lists.  If the routing
table would no longer support direct pointers to entries the locking could be
significantly simplified and the ARP table could use rmlocks (read-mostly locks)
as it is changed only very infrequently.  It's all about the number of locks that
have to be aquired per packet/lookup.  It also has the benefit of an order of a
magnitude less complexity (and hard to debug egde cases, which cannot be under-
estimated).

-- 
Andre