multiple routing tables review patch ready for simple testing.

Wed Apr 30 15:38:26 UTC 2008

Julian Elischer wrote:
> An interface may however be present in entries from multiple FIBs
> in which case the INCOMING packets on that interface need to
> be disambiguated with respect to which FIB they belong to.

Yes, there is no way the forwarding code alone can do this.

It should not be expected to, and it's important to maintain a clean 
functional separation there, otherwise one ends up in the same quagmire 
which has been plaguing a lot of QoS research projects over the years 
(Where do I put this bit of the system?)

>
> This is a job for an outside entity (from the fibs).
> In this case a packet classifier such as pf or ipfw is ideal
> for the job. providing an outside mechanism for implementing
> whatever policy the admin wants to set up.

Absolutely. This has been the intent from the beginning.

There is no "one size fits all" approach here. We could put a packet 
classifier into the kernel which works just fine for DOCSIS consumer 
distribution networks, but has absolutely no relevance to an ATM 
backbone (these are the two main flavours of access for folk in the UK).

>
> I find it is convenient to envision each routing FIB as a routing
> plane, in a stack of such planes. Each plane may know about the same
> interfaces or different interfaces. When a packet enters a routing
> plane it is routed according to the internal rules of that plane.
> Irrespective of how other planes may act.  Each plane can only route
> a packet to interfaces that are know about on that plane.
> Incoming packets on an interface don't know what plane to go to
> and must be told which to use by the external mechanism. It
> IS possible that an interface in the future might have a default
> plane, but I haven't implemented this.

This limitation seems fine for now.

Users can't be expected to configure the defaults "by default" if they 
aren't supported, so, if overall the VRF-like feature defaults to off, 
and there are big flashing bold letters saying "You must fully configure 
the forwarding plane mappings if you wish to use multiple FIBs", then 
that's fine by me.

>
> if you have several alias addresses on an interface it is possible
> that some FIBS know about some of them and others know about other
> addresses. New addresses when added are added to each FIB and
> whatever is adding them shoudl remove them from the ones that don't
> need it.  This may change but it fits in with how the current code
> works and keeps the diff to a manageable size.

    In any event, for plain old IP forwarding, a node's endpoint 
addresses are used only as convenient ways of referring to physical links.

To back up and give this some detailed background:

    For example, 192.0.2.1/24 might be configured on fxp0, and we 
receive a packet on another interface for 192.0.2.2. When resolving a 
route, the forwarding code needs to do a lookup to see from where 
192.0.2.2 is reachable before the next-hop is resolved in the table. 
That happens on a per-FIB basis, when the patches are applied -- however 
the job of tagging input for which FIB is the job of the classifier.

    The problems with the above approach begin when an input interface 
resides in multiple virtual FIBs (no 1:1 mapping), or when you can't 
refer to it by an address (it has no address -- unnumbered 
point-to-point link, or addresses do not apply), or when you attempt to 
implement encapsulation (e.g. GRE, IPIP) in the forwarding layer.

    Then, you're reliant on each individual FIB having resolved 
next-hops correctly. The existing forwarding code already does some of 
this by forcing the ifp to be set for any route added to the table. This 
is done implicitly for routes which transit point-to-point interfaces.
    BSD has had some weaknesses in this area. It makes implementing 
things like VRRP particularly difficult, which is why the ifnet approach 
to CARP was used (the forwarding table gets to see a single ifp); it 
eliminates a level of possible recursion from that layer of the routing 
stack.

    With multicast, for example, next-hops can't be identified by IPv4 
addresses alone. Every forwarding decision has potentially more than one 
result, and links are referred to by physical link (this could be an 
ifp, an interface index, a name, whatever), and where messages are 
forwarded is determined using a link-scope protocol such as IGMP.

    There, it's reasonable to expect that the user partitioned off the 
multicast forwarding planes into separate virtual FIBs, and that the 
appropriate rules in the classifier are configured.

    For SSM, the key (S,G) match has to happen in the input classifier, 
if one is going to route flows OK using the multiple FIB feature -- the 
multicast routing daemons have to be aware of it, 'cuz you can't run a 
separate instance of PIM for every set of flows -- PIM is greedy 
per-link, a !1:1 mapping problem exists, PIM has no way of telling 
separate instances apart (no hierarchy in the form of e.g. OSPF areas, 
and even OSPF won't let you put a link in more than one area -- virtual 
links don't count!)

    This is so much whizzing in the wind without a new MROUTING 
implementation though, and hierarchical multicast routing is a project 
in of itself.

To summarize:
    For now, the limitations of the system should be documented so that 
users don't inadvertently configure local forwarding loops, even for 
unicast traffic; with multicast, the amplification effect of 
misconfiguration is inherently more damaging to a network.

    The IPv4 address of an interface can't be used as an identifier for 
source routing -- there is no way of knowing that was the next-hop used 
by the last-hop, the information just ain't there -- so if you have the 
same input interfaces in multiple virtual FIBs, you need to double check 
the appropriate match rules are in place for the flows to go where you 
want them to go.

> (and it suits what I need for work where a route manager daemon
> knows to do this.) 

This is another reason why I maintain that RIB and FIB should have 
functional separation.

    It's unreasonable to expect the kernel to perform next-hop 
resolution on every route presented to it, beyond that which is required 
by the link layer (i.e. ARP, and that should be functionally separated 
too). Recursive resolution also demands stack space, and this is a 
scarce kernel resource.

    Of course, well behaved routers are engineered such that the 
recursion takes place at RIB level, where limits and policy can be more 
easily applied, and before the route is plumbed into the hardware TCAM 
(or software FIB). Don't try to make the kernel do your dirty laundry.

cheers
BMS

P.S. I see you tweaked verify_path() to do the lookup in the numbered 
FIB. Cool.

I should point out that for ad-hoc networks, the ability to turn off 
RPF/uRPF for multicast is needed as the routing domain is often NOT 
fully converged -- so the RPF checks normally present may discard 
legitimate traffic which hasn't been forwarded yet. An encapsulation is 
typically used to maintain forwarding state which is relevant to the 
particular topology in use.