multiple routing tables review patch ready for simple testing.
Bruce M Simpson
bms at incunabulum.net
Wed Apr 30 15:38:26 UTC 2008
Julian Elischer wrote:
> An interface may however be present in entries from multiple FIBs
> in which case the INCOMING packets on that interface need to
> be disambiguated with respect to which FIB they belong to.
Yes, there is no way the forwarding code alone can do this.
It should not be expected to, and it's important to maintain a clean
functional separation there, otherwise one ends up in the same quagmire
which has been plaguing a lot of QoS research projects over the years
(Where do I put this bit of the system?)
>
> This is a job for an outside entity (from the fibs).
> In this case a packet classifier such as pf or ipfw is ideal
> for the job. providing an outside mechanism for implementing
> whatever policy the admin wants to set up.
Absolutely. This has been the intent from the beginning.
There is no "one size fits all" approach here. We could put a packet
classifier into the kernel which works just fine for DOCSIS consumer
distribution networks, but has absolutely no relevance to an ATM
backbone (these are the two main flavours of access for folk in the UK).
>
> I find it is convenient to envision each routing FIB as a routing
> plane, in a stack of such planes. Each plane may know about the same
> interfaces or different interfaces. When a packet enters a routing
> plane it is routed according to the internal rules of that plane.
> Irrespective of how other planes may act. Each plane can only route
> a packet to interfaces that are know about on that plane.
> Incoming packets on an interface don't know what plane to go to
> and must be told which to use by the external mechanism. It
> IS possible that an interface in the future might have a default
> plane, but I haven't implemented this.
This limitation seems fine for now.
Users can't be expected to configure the defaults "by default" if they
aren't supported, so, if overall the VRF-like feature defaults to off,
and there are big flashing bold letters saying "You must fully configure
the forwarding plane mappings if you wish to use multiple FIBs", then
that's fine by me.
>
> if you have several alias addresses on an interface it is possible
> that some FIBS know about some of them and others know about other
> addresses. New addresses when added are added to each FIB and
> whatever is adding them shoudl remove them from the ones that don't
> need it. This may change but it fits in with how the current code
> works and keeps the diff to a manageable size.
In any event, for plain old IP forwarding, a node's endpoint
addresses are used only as convenient ways of referring to physical links.
To back up and give this some detailed background:
For example, 192.0.2.1/24 might be configured on fxp0, and we
receive a packet on another interface for 192.0.2.2. When resolving a
route, the forwarding code needs to do a lookup to see from where
192.0.2.2 is reachable before the next-hop is resolved in the table.
That happens on a per-FIB basis, when the patches are applied -- however
the job of tagging input for which FIB is the job of the classifier.
The problems with the above approach begin when an input interface
resides in multiple virtual FIBs (no 1:1 mapping), or when you can't
refer to it by an address (it has no address -- unnumbered
point-to-point link, or addresses do not apply), or when you attempt to
implement encapsulation (e.g. GRE, IPIP) in the forwarding layer.
Then, you're reliant on each individual FIB having resolved
next-hops correctly. The existing forwarding code already does some of
this by forcing the ifp to be set for any route added to the table. This
is done implicitly for routes which transit point-to-point interfaces.
BSD has had some weaknesses in this area. It makes implementing
things like VRRP particularly difficult, which is why the ifnet approach
to CARP was used (the forwarding table gets to see a single ifp); it
eliminates a level of possible recursion from that layer of the routing
stack.
With multicast, for example, next-hops can't be identified by IPv4
addresses alone. Every forwarding decision has potentially more than one
result, and links are referred to by physical link (this could be an
ifp, an interface index, a name, whatever), and where messages are
forwarded is determined using a link-scope protocol such as IGMP.
There, it's reasonable to expect that the user partitioned off the
multicast forwarding planes into separate virtual FIBs, and that the
appropriate rules in the classifier are configured.
For SSM, the key (S,G) match has to happen in the input classifier,
if one is going to route flows OK using the multiple FIB feature -- the
multicast routing daemons have to be aware of it, 'cuz you can't run a
separate instance of PIM for every set of flows -- PIM is greedy
per-link, a !1:1 mapping problem exists, PIM has no way of telling
separate instances apart (no hierarchy in the form of e.g. OSPF areas,
and even OSPF won't let you put a link in more than one area -- virtual
links don't count!)
This is so much whizzing in the wind without a new MROUTING
implementation though, and hierarchical multicast routing is a project
in of itself.
To summarize:
For now, the limitations of the system should be documented so that
users don't inadvertently configure local forwarding loops, even for
unicast traffic; with multicast, the amplification effect of
misconfiguration is inherently more damaging to a network.
The IPv4 address of an interface can't be used as an identifier for
source routing -- there is no way of knowing that was the next-hop used
by the last-hop, the information just ain't there -- so if you have the
same input interfaces in multiple virtual FIBs, you need to double check
the appropriate match rules are in place for the flows to go where you
want them to go.
> (and it suits what I need for work where a route manager daemon
> knows to do this.)
This is another reason why I maintain that RIB and FIB should have
functional separation.
It's unreasonable to expect the kernel to perform next-hop
resolution on every route presented to it, beyond that which is required
by the link layer (i.e. ARP, and that should be functionally separated
too). Recursive resolution also demands stack space, and this is a
scarce kernel resource.
Of course, well behaved routers are engineered such that the
recursion takes place at RIB level, where limits and policy can be more
easily applied, and before the route is plumbed into the hardware TCAM
(or software FIB). Don't try to make the kernel do your dirty laundry.
cheers
BMS
P.S. I see you tweaked verify_path() to do the lookup in the numbered
FIB. Cool.
I should point out that for ad-hoc networks, the ability to turn off
RPF/uRPF for multicast is needed as the routing domain is often NOT
fully converged -- so the RPF checks normally present may discard
legitimate traffic which hasn't been forwarded yet. An encapsulation is
typically used to maintain forwarding state which is relevant to the
particular topology in use.
More information about the freebsd-net
mailing list