initial call for review.. initial multi-fib (routing table) support

Fri Dec 14 12:02:29 PST 2007

Bruce M. Simpson wrote:
> Julian,
> 
> First of all, thank you very much for starting this work in a much 
> needed area.
> 
> Julian Elischer wrote:
>> This is a call for review for a change that is part of a
>> longer term project.
>>
>> This implements multiple routing tables. Eventually the implementation 
>> will be much cleaner but
>> the first implementation is designed to be backported to 6.x
>> and thus must be ABI compatible. It need not be particularly 'clean'
>> as the version in 8.x will be..  First itis needs to be committed to 
>> -current in its 6.x form so an MFC can occur, then the cleaner version 
>> can be committed over the top of it to clean it up.
> 
> Few comments:
> Allocating multiple radix trie heads is one way of doing this, but it 
> would be nice to be able to clean up the memory management in the radix 
> trie in general.

multiple radix trie heads is the 6.x compatible versions only.

my plan for 8.x is:

have an array of af->domain pointers to find the domain structure 
quickly given AF_XXX.

Expand the domain structure to include methods for that domain for 
rt_alloc et al.
also have a void * pointer to teh domain specific data structure that that
domain uses for routing.

generic rout calls that must cope with multiple protocols call dom_rtalloc()
and friends that call the methods. remember that the routing structure
for appletak (for example) is not based on netmasks and tere are other protocols
where netmask based tries are not the right idea so forcing every protocol
to use a trie is a silly idea.

Protocols can call their own routing calls directly. For example,
the inet protocols can call their methods directly (e.g. in_rtalloc()) 
without going via the method table.

The methods have an extra fib argument, but some protocols may choose 
to ignore it. 
the current radix trie code is still used for inet but supplied as 
a utility to any protocolnfamilies (e.g. inet) that need it.

> 
> I've seen implementations which do this by assigning index numbers or 
> bit sets to the radix trie entries. That way, you don't need to keep 
> multiple redundant copies of the same data around -- this IS the kernel 
> FIB after all, and if you're running a router in the Default Free Zone, 
> or with a considerable BGP topology, this kind of redundancy in the 
> forwarding plane is not an OK use of memory resources.

Eventually it will be up to each protocol family how it support s multiple fibs.

(or even if it wants to)

> 
> It's been a few months, but I believe this is how OpenBSD does it;
OpenBSD just duplicae the tables as I do but not with a fixed 2D array.
I'm stuck with the 2D array for the 6.x compatible version, because
1 1D ARRAY (6.X) is a subset od a 2D array and thus it is backwards
compatible :-)

; ipfw 
> also does something similar deep in its innards, the rules are tagged 
> with bitsets to specify which sets they are present in.

possibly but its' more complicated.

>  [I see similar memory management issues with C++ STL containers, which 
> irritates me; Boost++'s multi_index_container is an analogous idiom.]

we have to decide between simplicity and squeezing every last byte out..
I can't imagine trying to store two copies of the entire routing 
tree in kernel memory so I don't think that such complexity is worth it.
Once the framework is in place however you are welcome to rey any method 
that tries your fancy :-)

> 
> One of the big strengths of the BSD radix trie, as implemented by Keith 
> Sklower, was that it could be regression tested independently of the 
> kernel. I'd very much like to see this capability retained, and perhaps 
> expanded upon, as this is a sensitive area of work.

> 
> I'd encourage you to take a look at the OpenBSD changes. They are much 
> less invasive than this patch, and whilst they don't provide the 
> setfib() syscall functionality, that could be easily grafted on top. I 
> understand your folk's requirements for multiple tables, I'm sure there 
> is a possible fit here given the idioms described herein.

I have been through the OpenBSD patches. 
they influenced this greatly..

> 
> As I say it's been months since I last had a chance to look at this, and 
> I am busy finishing up the first phase of another project, so I don't 
> have all of these changes to hand -- however -- here's a good date and 
> starting position:
> 
>    
> http://www.openbsd.org/cgi-bin/cvsweb/src/sys/net/radix.c.diff?r1=1.20&r2=1.21 

been there

> 
> 
> I know there is an element of Not-Invented-Here which creeps in, but, 
> when all is said and done, OpenBSD's approach is viable, compact, and 
> simple, and addresses folk's immediate requirements for multi-path support.
> 
> They don't address SMP, multicast, or source address selection, but 
> those are future development stories.
> 
> cheers
> BMS