Routing SMP benefit

Bruce M. Simpson bms at FreeBSD.org
Wed Jan 2 15:00:08 PST 2008


Andre Oppermann wrote:
> So far the PPS rate limit has primarily been the cache miss penalties
> on the packet access.  Multiple CPUs can help here of course for bi-
> directional traffic.  Hardware based packet header cache prefetching as
> done by some embedded MIPS based network processors at least doubles the
> performance.  Intel has something like this for a couple of chipset and
> network chip combinations.  We don't support that feature yet though.

What sort of work is needed in order to support header prefetch?

>
> Many of the things you mention here are planned for FreeBSD 8.0 in the
> same or different form.  Work in progress is the separation of the ARP
> table from kernel routing table.  If we can prevent references to radix
> nodes generally almost all locking can be done away with.  Instead only
> a global rmlock (read-mostly) could govern the entire routing table.
> Obtaining the rmlock for reading is essentially free.

This is exactly what I'm thinking, this feels like the right way forward.

A single rwlock should be fine, route table updates should generally 
only be happening from one process, and thus a single thread, at any 
given time.

> Table changes
> are very infrequent compared to lookups (like 700,000 to 300-400) in
> default free Internet routing.  The radix trie nodes are rather big
> and could use some more trimming to make the fit a single cache line.
> I've already removed some stuff a couple of years ago and more can be
> done.
>
> It's very important to keep this in mind: "profile, don't speculate".
Beware though that functionality isn't sacrificed at the expense of this.

For example it would be very, very useful to be able to merge the 
multicast routing implementation with the unicast -- with the proviso of 
course that mBGP requires that RPF can be performed with a separate set 
of FIB entries from the unicast FIB.

Of course if next-hops themselves are held in a container separately 
referenced
from the radix node, such as a simple linked list as per the OpenBSD code.

If we ensure the parent radix trie node object fits in a cache line, 
then that's fine.

[I am looking at some stuff in the dynamic/ad-hoc/mesh space which is 
really going to need support for multipath similar to this.]

later
BMS


More information about the freebsd-net mailing list