Benchmark: NetBSD 2.0 beats FreeBSD 5.3 in server performance

Thu Jan 6 06:01:06 PST 2005

On Thu, 6 Jan 2005, Jesper Louis Andersen wrote:

> But this is speculation. I would like to see perfarmonce benchmarks for
> your scenario as well.

Since the performance optimization on FreeBSD for the last few years, with
features like SMPng, libpthread, and UMA, has been focussed on macro
performance not micro performance, it's not surprising the micro
performance requires tuning.  However, there is lots of on-going work on
this front so I'd expect to see continued improvement in the immediate
(5.4) life time.  There are a number of optimizations in 6.x that are on
the merge path for 5.x that will directly impact the results in these
measurements -- in particular, what is clearly a bug in the way mutexes
are released on UP kernels that adds almost a hundred cycles to every
mutex release operation.  This was identified in my micro-benchmarking
shortly after the release of 5.3, and may play a substantial part in the
posted results, especially for the very micro benchmarks that involve
kernel memory allocation.  Hopefully for 5.4 we'll also have a move to
soft critical sections protecting UMA memory caches, which should
eliminate mutex operations from common case memory allocation making a
further benefit.  Those changes are not yet in CVS HEAD, but are in
Perforce and show nice benefits in micro-benchmarks involving kernel
object allocation, such as socket allocation, etc.

> I disagree that the original post is entirely FUD. While the conclusion
> is subjective, fact is that at the particular mix of microbenchmarks
> shows NetBSD faster than FreeBSD. I am wondering if that is the price
> you pay on single-cpu boxes to gain speed at the SMP boxes.  And if this
> is true the question becomes if fine-grained locking is worth the
> implementation time when most computers are still single-cpu (Yes, I
> know this can change rapidly with the newer CPU types). 

I think the post Bosko is referring to is indeed almost entirely FUD,
since it consists of one line of useful content ("Look, a set of
decent-looking set of microbenchmark results") and the rest is ("I hate
these specific FreeBSD developers, why do we let committers design the
system!").  I.e., the posted report was simply an excuse for someone to
chew out the FreeBSD developers.  That doesn't invalidate the report, but
it does suggest that the interpretation presented in that post might be a
little unbalanced.  Obviously, we need to read and understand the report,
and act on areas where we can improve.

Regarding SMP or not -- the path the FreeBSD Project has taken (and this
choice was before I was really all that involved, to be honest) was a
re-architecture of the kernel to improve performance, scalability, and
structure via a movement to a parallelizable, preemptible, threaded
kernel.  I think this is the right architecture to move to, as it not only
improves performance and scalability, but it also closes a lot of existing
race conditions in the kernel that only became more exposed as threading
and SMP became more predominant.  This has had a lot of performance
benefits, but comes with initial costs that aren't all immediately offset
by initial benefits.  Now that this model is largely adopted, we'll see a
nice increase in benefits over time -- i.e., it was an investment.
Obviously, people can and will disagree about the nature of the
investment, the time for payoff, etc, but I think it's easy to argue there
have been some very strong benefits so far.  Likewise, it's possible to
argue that the exact path by which it was implemented could have been
better -- i.e., if the dotcom crash hadn't happened at the wrong moment
leaving us with far fewer developers working on it than hoped.  However,
we've accomplished a lot, especially given the available resources.
Immediate and measurable performance from the new architecture is now the
primary thrust of the netperf work, having gotten the initial cut working,
so we should see some large gains in that area, having an immediate effect
on both micro-benchmarks and macro-benchmarks.

More on SMP generally -- as other posts argue, I think you'll see SMP
become more and more a part of "out of the box" systems over the next
couple of years, especially for server class hardware.  It's quite hard to
buy decent server equipment from Intel that doesn't have at least HTT
today.  It is very important that we re-optimize UP having adopted the
SMPng approach, and there's a substantial amount of work going into that,
but SMP is an increasing reality that the FreeBSD Project has been
addressing head-on for several years.  It required a lot of work over that
time, but as a result of that investment, we will be ready for the next
generation of systems where SMP is no longer an option.  So I'd look to
immediate performance improvements in 5.x-STABLE for UP and SMP over the
next few months leading up to 5.4. 

We should all bug Stephan Uphoff and John Baldwin to get the critical
section stuff in the tree so I can merge in my UMA changes, and make sure
that the UP mutex optimizations are also in RELENG_5 :-).

Robert N M Watson