Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

Thu Dec 15 13:48:57 UTC 2011

On Thu, Dec 15, 2011 at 05:32:47AM -0700, Samuel J. Greear wrote:
> > Well, the only way it's going to get fixed is if someone sits down,
> > replicates it, and starts to document exactly what it is that these
> > benchmarks are/aren't doing.
> >
> 
> I think you will find that investigation is largely a waste of time,
> because not only are some of these benchmarks just downright silly,
> there are huge differences in the environments (compiler versions),
> etc., etc. leading to a largely apples/oranges comparison. But also
> the the analysis and reporting of the results by Phoronix is simply
> moronic to the point of being worse than useful, they are spreading
> misinformation.
> 
> Take the first test as an example, Blogbench read. This doesn't raise
> any red flags, right? At least not until you realize that Blogbench
> isn't a read test, it's a read/write test. So what they have done here
> is run a read/write test and then thrown away the write results for
> both platforms and reported only the read results. If you dig down
> into the actual results,
> http://openbenchmarking.org/result/1112113-AR-ORACLELIN37 -- you will
> see two Blogbench numbers, one for read and another for write. These
> were both taken from the same Blogbench run, so FreeBSD optimizes
> writes over reads, that's probably a good thing for your data but a
> bad thing when someone totally misrepresents benchmark results.
> 
> Other benchmarks in the Phoronix suite and their representations are
> similarly flawed, _ALL_ of these results should be ignored and no time
> should be wasted by any FreeBSD committer further evaluating this
> garbage. (Yes, I have been down this rabbit hole).

For sake of argument, let's say we throw out the Phoronix benchmarks as
a data source (I don't think the benchmark specifically implied or
stated "this is all because of SCHED_ULE" though; remember, that's what
we're supposed to be focusing on.  There may not be a direct correlation
between the Phoronix benchmarks and the ULE issue reported here...).
That said: thrown out, data ignored, done.

Now what?  Where are we?  We're right back where we were a day or two
ago; meaning no closer to solving the dilemma reported by users and
SCHED_ULE.  Heck, we're not even sure if there is an issue, other than
some folks confirming that SCHED_4BSD performs better for them (that's
what started this whole thread), and there are at least a couple which
have stated this.

So given the above semi-devil's-advocate response -- Sam, do you have
something positive or progressive to offer so we can move forward on the
ULE vs. 4BSD debacle?  :-)  The smiley is meant to be sincere, not
sarcastic.

I'm getting to the point where I'm considering formulating a private
mail to Jeff Roberson, requesting that he be aware of the discussion
that's happening (not that he necessarily follow or read it), and that
based on what I can tell we're at a roadblock -- nobody so far is
absolutely certain how to "benchmark" and compare ULE vs. 4BSD in
multiple ways, so that those of us involved here can run such utilities
and provide the data somewhere central for devs to review.  I only
mention this because so far I haven't seen anyone really say "okay, this
is what we should be using for these kinds of tests".  Yay nature of the
beast.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |