FreeBSD handles leapsecond correctly

Sun Jan 8 03:17:10 PST 2006

Matthew Dillon wrote:
> 
> :Matt,
> :
> :I've been testing network and routing performance over the past two weeks
> :with an calibrated Agilent N2X packet generator.  My test box is a dual
> :Opteron 852 (2.6Ghz) with Tyan S8228 mobo and Intel dual-GigE in PCI-X-133
> :slot. Note that I've run all tests with UP kernels em0->em1.
> :
> :For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
> :forward enabled.  A em(4) patch from Scott Long implementing a taskqueue
> :raised this to 729kpps.
> :
> :For stock DragonFlyBSD-1.4-RC1 I've got 327kpps and then it breaks down and
> :never ever passes a packet again until a down/up on the receiving interface.
> :net.inet.ip.intr_queue_maxlen has to be set to 200, otherwise it breaks down
> :at 252kpps already.  Enabling polling did not make a difference and I've tried
> :various settings and combinations without any apparent effect on performance
> :(burst=1000, each_burst=50, user_frac=1, pollhz=5000).
> :
> :What suprised me most, apart from the generally poor performance, is the sharp
> :dropoff after max pps and the wedging of the interface.  I didn't see this kind
> :of behaviour on any other OS I've tested (FreeBSD and OpenBSD).
> :
> :--
> :Andre
> 
>     Well, considering that we haven't removed the MP lock from the network
>     code yet, I'm not surprised at the poorer performance.  The priority has
>     been on getting the algorithms in, correct, and stable, proving their
>     potential, but not hacking things up to eek out maximum performance
>     before its time.  At the moment there is a great deal of work slated for
>     1.5 to properly address many of the issues.

This was using the UP kernel.  No SMP, only one CPU.  The CPU was not maxed
out as shown by top.  There must be something else that is killing performance
on DragonFlyBSD.

-- 
Andre

>     Remember that the difference between 327kps and 792kps is the difference
>     between 3 uS and 1.2 uS per packet of overhead.  That isn't all that
>     huge a difference, really, especially considering that everything is
>     serialized down to effectively 1 cpu due to the MP lock.
> 
> :For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
> :forward enabled.  A em(4) patch from Scott Long implementing a taskqueue
> :raised this to 729kpps.
> 
>     The single biggest overhead we have right now is that we have not
>     yet embedded a LWKT message structure in the mbuf.  That means we
>     are currently malloc() and free()ing a message structure for every
>     packet, costing at least 700 nS in additional overhead and possibly
>     more if a cross-cpu free is needed (even with the passive IPIQ the
>     free() code does in that case).  This problem is going to be fixed once
>     1.4 is released, but in order to do it properly I intend to completely
>     separate the mbuf data vs header concept... give them totally different
>     structural names instead of overloading them with a union, then embedding
>     the LWKT message structure in the mbuf_pkt.
> 
>     Another example would be our IP forwarding code.  Hahahah.  I'm amazed
>     that it only takes 3 uS considering that it is running under both the
>     MP lock *AND* the new mutex-like serializer locks that will be replacing
>     the MP lock in the network subsystem AND hacking up those locks (so there
>     are four serializer locking operations per packet plus the MP lock).
> 
>     The interrupt routing code has similar issues.  The code is designed to
>     be per-cpu and tested in that context (by testing driver entry from other
>     cpus), but all hardware interrupts are still being taken on cpu #0, and
>     all polling is issued on cpu #0.  This adds considerable overhead,
>     though it is mitigated somewhat by packet aggregation.
> 
>     There are two or three other non-algorithmic issues of that nature in
>     the current network path that exist to allow the old algorithms to be
>     migrated to the new ones and which are slowly being cleaned up.  I'm not
>     at all surprised that all of these shims cost us 1.8 uS in overhead.
>     I've run end-to-end timing tests for a number of operations, which you
>     can see from my BayLisa slides here:
> 
>         http://www.dragonflybsd.org/docs/LISA200512/
> 
>     What I have found is that the algorithms are sound and the extra overheads
>     are basically just due to the migrationary hacks (like the malloc).
>     Those tests also tested that our algorithms are capable of pipelining
>     (MP safe wise) between the network interrupt and TCP or UDP protocol
>     stacks, and they can with only about 40 ns of IPI messaging overhead.
>     There are sysctls for testing the MP safe interrupt path, but they aren't
>     production ready yet (because they aren't totally MP safe due to the
>     route table, IP filter, and mbuf stats which are the only remaining
>     items that need to be made MP safe).
> 
>     Frankly, I'm not really all that concerned about any of this.  Certainly
>     not raw routing overhead (someone explain to me why you don't simply buy
>     a cisco, or write a custom driver if you really need to pop packets
>     between interfaces at 1 megapps instead of trying to use a piece of
>     generic code in a generic operating system to do it).  Our focus is
>     frankly never going to be on raw packet switching because there is no
>     real-life situation where you would actually need to switch such a high
>     packet rate where you wouldn't also have the budget to simply buy an
>     off-the-shelf solution.
> 
>     Our focus vis-a-vie the network stack is going to be on terminus
>     communications, meaning UDP and TCP services terminated or sourced on
>     the machine.  All the algorithms have been proved out, the only thing
>     preventing me from flipping the MP lock off are the aformentioned
>     mbuf stats, route table, and packet filter code.  In fact, Jeff *has*
>     turned off the MP lock for the TCP protocol threads for testing purposes,
>     with very good results.  The route table is going to be fixed this month
>     when we get Jeff's MPSAFE parallel route table code into the tree.  The
>     mbuf stats are a non-problem, really, just some minor work.  The packet
>     filter(s) are more of an issue.
> 
>     The numbers I ran for the BayLisa talk show our network interrupt overhead
>     is around 1-1.5 uS per packet, and our TCP overhead is around
>     1-1.5 uS per packet.  700 ns of that is the aformentioned malloc/free
>     issue, and a good chunk of the remaining overhead is MP lock related.
> 
> :For stock FreeBSD-7-CURRENT from 28. Dec. 2005 I've got 580kpps with fast-
> :forward enabled.  A em(4) patch from Scott Long implementing a taskqueue
> :raised this to 729kpps.
> 
>     An interface lockup is a different matter.  Nothing can be said about
>     that until the cause of the problem is tracked down.  I can't speculate
>     as to the problem without more information.
> 
>                                         -Matt
>                                         Matthew Dillon
>                                         <dillon at backplane.com>