Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

Wed Dec 21 00:28:08 UTC 2011

On 12/21/11 00:29, Jeremy Chadwick wrote:
> On Tue, Dec 20, 2011 at 11:54:23PM +0100, O. Hartmann wrote:
>> On 12/20/11 22:45, Samuel J. Greear wrote:
>>> http://www.osnews.com/story/25334/DragonFly_BSD_MP_Performance_Significantly_Improved
>>>
>>> PostgreSQL tests, see the linked PDF for #'s on FreeBSD, DragonFly, Linux
>>> and Solaris. Steps to reproduce these benchmarks provided.
>>>
>>> Sam
>>>
>>> On Tue, Dec 20, 2011 at 1:20 PM, Igor Mozolevsky <igor at hybrid-lab.co.uk>wrote:
>>>
>>>> Interestingly, while people seem to be (arguably rightly) focused on
>>>> criticising Phoronix's benchmarking, nobody has offered an alternative
>>>> benchmark; and while (again, arguably rightly) it is important to
>>>> benchmark real world performance, equally, nobody has offered any
>>>> numbers in relation to, for example, HTTP or SMTP, or any other "real
>>>> world"-application torture tests done on the aforementioned two
>>>> platforms... IMO, this just goes to show that "doing is hard" and
>>>> "criticising is much easier" (yes, I am aware of the irony involved in
>>>> making this statement, but someone has to!)
>>>>
>>>>
>>>> Cheers,
>>>> Igor M :-)
>>>> _______________________________________________
>>>> freebsd-current at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>>>>
>>
>> Thanks for those numbers.
>> Impressive how Matthew Dillon's project jumps forward now. And it is
>> still impressive to see that the picture is still in the right place
>> when it comes to a comparison to Linux.
>> Also, OpenIndiana shows an impressive performance.
> 
> Preface to my long post below:
> 
> The things being discussed here are benchmarks, as in "how much work
> can you get out of Thing".  This is VERY DIFFERENT from testing
> interactivity in a scheduler, which is more of a test that says "when
> Thing X is executed while heavier-Thing Y is also being executed, how
> much interaction is lost in Thing X".
> 
> The reason people notice this when using Xorg is because it's visual,
> in an environment where responsiveness is absolutely mandatory above all
> else.  Nobody is going to put up with a system where during a buildworld
> they go to move a window or click a mouse button or type a key and find
> that the window doesn't move, the mouse click is lost, or the key typed
> has gone into the bit bucket -- or, that those things are SEVERELY
> delayed, to the point where interactivity is crap.

I whitnessed sticky, jumpy and non-responsive-for seconds FreeBSD
servers (serving homes, NFS/SAMBA and PostgreSQL database (small)).
Those "seconds" where enough to cut a ssh line. Not funny. Network
traffic droped significantly. X/Desktop makes the problem visible,
indeed. But not seeing it does not mean it isn't there.
This might be the reason why FreeBSD is so much behind when it comes to X?

> 
> I just want to make that clear to folks.  This immense thread has been
> with regards to the latter -- bad interactivity/responsiveness on a
> system which was undergoing load that SHOULD be distributed "more
> evenly" across the system *while* keeping interactivity/responsiveness
> high.  Historically nice/renice has been used for this task, but that
> was when kernels were a little less complex and I/O subsystems were less
> complex.  Remember: we've now got schedulers for each type of thing,
> and who gets what priority?  You get my point I'm sure.
> 
> So remember: this was to discuss that aspect, with regards to ULE vs.
> 4BSD schedulers.
> 
> Now, back to the benchmarks:
> 
> This also interested me:
> 
> * Linux system crashed
>   http://leaf.dragonflybsd.org/mailarchive/kernel/2011-11/msg00008.html
> 
> * OpenIndiana system crashed same way as Linux system
>   http://leaf.dragonflybsd.org/mailarchive/kernel/2011-11/msg00017.html
> 
> I cannot help but wonder if the Linux and OpenIndiana installations were
> more stressful on the hardware -- getting more out of the system, maybe
> resulting in increased power/load, which in turn resulted in the systems
> locking up (shoddy PSU, unstable mainboard, MCH problems, etc.).

Is FreeBSD supposed to run on dumpyard equipment? In former times,
freeBSD was used on high value hardware, not the decomissioned crap with
shoddy PSUs or whatsoever.
If I need a server, I care about quality hardware as I do for my lab's
box and my own box at home. I expect a "server garde" hardware to act
like that and I expect the operating system to get the maximum out of
that hardware. Otherwise it is not worth one shot.

> 
> My point is that Francois states these things in such a way to imply
> that "DragonflyBSD was more stable", when in fact I happen to wonder the
> opposite point -- that is to say, Linux and OpenIndiana were trying to
> use the hardware more-so than DragonflyBSD, thus tickled what may be a
> hardware-level problem.
> 
>> But this is only one suite of testing. Scientific Linux is supposed to
>> give the best performance for scientifi purposes, i.e. for longhaul
>> calculations, much numerical stuff. It outperforms in a typical server
>> application FreeBSd, were "FreeBSD shoulkd have the power to serve".
>>
>> Is the postgresql benchmark the only way to benchmark?
> 
> I sure hope not.  But you know what's equally as interesting?  This:
> 
> http://people.freebsd.org/~kris/scaling/
> 
> Specifically circa 2008:
> 
> http://people.freebsd.org/~kris/scaling/4cpu-pgsql.png
> http://people.freebsd.org/~kris/scaling/pgsql-16cpu-2.png
> http://people.freebsd.org/~kris/scaling/pgsql-16cpu.png
> 
> Now, I don't know if what was used in those ("pgsql sysbench") was the
> same thing as "pg_bench" in the DragonflyBSD tests, but if so, the
> numbers are different to a point that is preposterous.
> 
> There's also this:
> 
> http://people.freebsd.org/~kris/scaling/pgsql-ncpu.png
> 
> Now, compare those numbers to the TPS numbers shown here:
> 
> http://dl.wolfpond.org/Pg-benchmarks.pdf
> 
> So um... yeah.  Now, if someone here is going to say "well, what
> was tested by Kris was FreeBSD 7.0, while what was tested by Francois
> was FreeBSD 9.0, and there have been improvements", then I ask that
> someone show me where the improvements are that would exhibit a 4-8x
> performance increase in some cases.
> 
> This rambling of mine is the same rambling I posted earlier in this
> thread.  There needs to be a consistent, standardised way of testing
> this stuff.  Every system tested tuned the exact same way, software
> configured the same way, absolutely no quirks applied, etc..  Otherwise
> we end up with "mixed results" as shown above.

Didn't got M. Larabel at Phoronix this half the way, except the ZFS fault?

> 
> Much to the disapproval of others, the Phoronix test suite is supposed
> to be that "standard".  Meaning, it's a suite you're supposed to be able
> to install and thus ensures that, aside from compiler used and any
> system tests, that the same code is being used regardless of what system
> and OS it's on.  Have I ever used it?  No.  And it's important that I
> admit that up front, because being honest is necessary.
> 
>> Well, this inspires me to gather together all the benchmarks someone
>> could find. There were lots of compalins about FreeBSD's poor
>> performance with BIND - once a domain of FreeBSD. Network performance
>> seems also to be an issue if it comes to scalability.
>> It would be nice to see what portion of the raw CPU/GPU power the OS
>> (FreeBSD, Linux ...)  delivers to scientific applications.
> 
> Kris Kenneway's "BIND benchmark" that was released a long time ago
> touched base on this.  Remember: these plots show nothing other than
> number of queries per second correlated with number of DNS server
> threads (since BIND does have a 1:1 thread-to-CPU creation ratio):
> 
> http://people.freebsd.org/~kris/scaling/bind-pt.png
> http://people.freebsd.org/~kris/scaling/bind-pt-2.png
> http://people.freebsd.org/~kris/scaling/bind-pt-gige.png
> 
>> I only know some kind of benchmarks, BYTE UNIX benchmark, LINPACK test
>> ... Does someone know a site to look for a couple of benchmarks to test
>>
>> a) memory system
>> b) scalability (apart from pgbench)
>> c) network performance/throughput/network scalability
>> d) portion of CPU performance the system delivers for numerical
>> applications to the user apart from the system's own consumption
>> e) disk I/O performance and scalability
>>
>> it would also be nice to discuss some nice settings and performance
>> tunings for FreeBSD for several scenarios. I guess, starting developing
>> benchmarking test scenarios for several purposes would lead faster to
>> real numbers and non polemic than weird discussions ...
> 
> All I wish is that we had some kind of "test suite" of our own, maybe as
> a port, maybe in the base system, which could really help with all of
> this.  Something consistent.

Why not supporting those guys at Phoronix? If we start with "our own",
then we end up as you described above - not comparable, different
numbers on different platforms, no normalization possible.

> 
> Now I'm switching back to discussing interactivity/responsiveness tests:
> 
> Attilio Rao did comment in this thread to me, giving me some test
> methodologies for testing interactivity during two types of simultaneous
> loads -- but one involves dnetc, which I imagine means I'd need to get
> familiar with that whole thing.
> 
> http://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064936.html
> 
> I haven't responded to his post yet (this thread is so long and tedious
> that I'm having serious problems following it + remembering all the
> details -- am I the only one who feels daunted by this?  God I hope
> not), but his insights are, as always, beneficial, but also
> overwhelming.  Furthermore, I do not have 16-core or 24-core systems
> to test on -- I have single-CPU, quad-core and dual-core systems to test
> on.  I am a firm believer these are going to make up the majority of the
> FreeBSD userbase (desktop and server environments).  Extreme hardware (e.g.
> quad CPU with 12 cores per CPU) can be tested too, but let's at least
> pick a demographic to start with.
> 
> Again: the FreeBSD users and administrative community want to help.  All
> of us do.  We just need to know exactly what we should be doing to test,
> and what exactly we're testing for.  I'll be blunt while choosing to
> play the Idiot Admin for a moment: I'd be much happier if someone had a
> tarball of shell scripts and things which could be used to test these
> things.  Lots of things need to be kept in mind, such as if someone is
> running the "client" test on the same box as the "server" test, and
> things like "the test data is written to a local filesystem, with
> echo/printf statements constantly flushed" (great, now we're causing I/O
> load on top of our tests!), which to me means we should probably be
> using something like mdconfig(8) to create a temporary filesystem to
> store logs/data results.
> 
> The KTR stuff Atillio and many others have requested, I think, will be
> the most beneficial way to get the developers the data they need.  I had
> no idea about it until I found out that KTR was something completely
> different than ktrace.
> 
> I still haven't found the time to do all of this, BTW, and for that I
> apologise.  The reason has to do with time at work + personal desire to
> do it.  When I get a daunting task, I tend to get... well, not
> depressed, but "scared" of the massive undertaking since it involves
> lots of recurring tests, reboots, etc. -- hours of work -- and if I get
> that wrong, it's wasted effort (thus wasted developer time).  I want to
> get it right.  :-)
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-performance/attachments/20111221/1b9a843b/signature.pgp