FreeBSD handles leapsecond correctly
Matthew Dillon
dillon at apollo.backplane.com
Sat Jan 7 11:40:20 PST 2006
:Matt,
:
:I've been testing network and routing performance over the past two weeks
:with an calibrated Agilent N2X packet generator. My test box is a dual
:Opteron 852 (2.6Ghz) with Tyan S8228 mobo and Intel dual-GigE in PCI-X-133
:slot. Note that I've run all tests with UP kernels em0->em1.
:
:For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
:forward enabled. A em(4) patch from Scott Long implementing a taskqueue
:raised this to 729kpps.
:
:For stock DragonFlyBSD-1.4-RC1 I've got 327kpps and then it breaks down and
:never ever passes a packet again until a down/up on the receiving interface.
:net.inet.ip.intr_queue_maxlen has to be set to 200, otherwise it breaks down
:at 252kpps already. Enabling polling did not make a difference and I've tried
:various settings and combinations without any apparent effect on performance
:(burst=1000, each_burst=50, user_frac=1, pollhz=5000).
:
:What suprised me most, apart from the generally poor performance, is the sharp
:dropoff after max pps and the wedging of the interface. I didn't see this kind
:of behaviour on any other OS I've tested (FreeBSD and OpenBSD).
:
:--
:Andre
Well, considering that we haven't removed the MP lock from the network
code yet, I'm not surprised at the poorer performance. The priority has
been on getting the algorithms in, correct, and stable, proving their
potential, but not hacking things up to eek out maximum performance
before its time. At the moment there is a great deal of work slated for
1.5 to properly address many of the issues.
Remember that the difference between 327kps and 792kps is the difference
between 3 uS and 1.2 uS per packet of overhead. That isn't all that
huge a difference, really, especially considering that everything is
serialized down to effectively 1 cpu due to the MP lock.
:For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
:forward enabled. A em(4) patch from Scott Long implementing a taskqueue
:raised this to 729kpps.
The single biggest overhead we have right now is that we have not
yet embedded a LWKT message structure in the mbuf. That means we
are currently malloc() and free()ing a message structure for every
packet, costing at least 700 nS in additional overhead and possibly
more if a cross-cpu free is needed (even with the passive IPIQ the
free() code does in that case). This problem is going to be fixed once
1.4 is released, but in order to do it properly I intend to completely
separate the mbuf data vs header concept... give them totally different
structural names instead of overloading them with a union, then embedding
the LWKT message structure in the mbuf_pkt.
Another example would be our IP forwarding code. Hahahah. I'm amazed
that it only takes 3 uS considering that it is running under both the
MP lock *AND* the new mutex-like serializer locks that will be replacing
the MP lock in the network subsystem AND hacking up those locks (so there
are four serializer locking operations per packet plus the MP lock).
The interrupt routing code has similar issues. The code is designed to
be per-cpu and tested in that context (by testing driver entry from other
cpus), but all hardware interrupts are still being taken on cpu #0, and
all polling is issued on cpu #0. This adds considerable overhead,
though it is mitigated somewhat by packet aggregation.
There are two or three other non-algorithmic issues of that nature in
the current network path that exist to allow the old algorithms to be
migrated to the new ones and which are slowly being cleaned up. I'm not
at all surprised that all of these shims cost us 1.8 uS in overhead.
I've run end-to-end timing tests for a number of operations, which you
can see from my BayLisa slides here:
http://www.dragonflybsd.org/docs/LISA200512/
What I have found is that the algorithms are sound and the extra overheads
are basically just due to the migrationary hacks (like the malloc).
Those tests also tested that our algorithms are capable of pipelining
(MP safe wise) between the network interrupt and TCP or UDP protocol
stacks, and they can with only about 40 ns of IPI messaging overhead.
There are sysctls for testing the MP safe interrupt path, but they aren't
production ready yet (because they aren't totally MP safe due to the
route table, IP filter, and mbuf stats which are the only remaining
items that need to be made MP safe).
Frankly, I'm not really all that concerned about any of this. Certainly
not raw routing overhead (someone explain to me why you don't simply buy
a cisco, or write a custom driver if you really need to pop packets
between interfaces at 1 megapps instead of trying to use a piece of
generic code in a generic operating system to do it). Our focus is
frankly never going to be on raw packet switching because there is no
real-life situation where you would actually need to switch such a high
packet rate where you wouldn't also have the budget to simply buy an
off-the-shelf solution.
Our focus vis-a-vie the network stack is going to be on terminus
communications, meaning UDP and TCP services terminated or sourced on
the machine. All the algorithms have been proved out, the only thing
preventing me from flipping the MP lock off are the aformentioned
mbuf stats, route table, and packet filter code. In fact, Jeff *has*
turned off the MP lock for the TCP protocol threads for testing purposes,
with very good results. The route table is going to be fixed this month
when we get Jeff's MPSAFE parallel route table code into the tree. The
mbuf stats are a non-problem, really, just some minor work. The packet
filter(s) are more of an issue.
The numbers I ran for the BayLisa talk show our network interrupt overhead
is around 1-1.5 uS per packet, and our TCP overhead is around
1-1.5 uS per packet. 700 ns of that is the aformentioned malloc/free
issue, and a good chunk of the remaining overhead is MP lock related.
:For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
:forward enabled. A em(4) patch from Scott Long implementing a taskqueue
:raised this to 729kpps.
An interface lockup is a different matter. Nothing can be said about
that until the cause of the problem is tracked down. I can't speculate
as to the problem without more information.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-current
mailing list