Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920

Andriy Gapon avg at icyb.net.ua
Mon Apr 12 14:41:42 UTC 2010


on 12/04/2010 07:12 Maho NAKATA said the following:
> Hi FreeBSD developers,
> [the original article in Japanese can be found at
> http://blog.goo.ne.jp/nakatamaho/e/b5f6fbc3cc6e1ac4947463eb1ca4eb0a ] 
> 
> *Abstract*
> I compared the peak performance of FreeBSD 8.0/amd64 and Ubuntu 9.10 amd64 using dgemm
> (a linear algebra routine, matrix-matrix multiplication).
> I obtained only 70% of theoretical peak performance on FreeBSD 8/amd64 and
> almost 95% on Ubuntu 9.10 /amd64. I'm really disappointed.

Sorry about that, but more important question (for us) is: are you willing to help
us improve in addition to reporting your results?

> *Introduction*
> I'm a friend of Gotoh Kazushige, the principal developers of GotoBLAS. He told me that
> FreeBSD is not suitable OS for scientific computing or high performance computing. He says
> (in Japanese and my translation):
> 
>> I guess FreeBSD does page coloring, but I don't think FreeBSD considers very large cache
>> size which recent CPU has.

AFAIK, recent FreeBSD doesn't use page coloring anymore.

>> Support of a very large cache on Linux is still not very will
>> sophisticated, but on *BSDs, its worst; they uses too fine memory allocation method, 
>> so we cannot expect large continuous physical memory allocation.

Can your friend provide more explanation about these points in technical terms?
E.g. what kind of support, in his opinion, is needed for very large caches?
Why, in his opinion, the memory needs to be physically contiguous?

Perhaps, he talks about support of large pages (2M) and related improvements in
TLB performance.  If so, he (and you) may read about 'superpages' feature of FreeBSD.
I am not sure if it is enabled by default in 8.0, you can check vm.pmap.pg_ps_enabled.

>> Moreover, process scheduling is not so nice as *BSD employs an algorithm that
>> changes physical CPUs in turn instead of allocating one core for such kind of jobs.
>> Take your own benchmark, and you'll see..

Here I can only add an anecdotal 'me too'.
Sometimes I run single-threaded high-cpu programs like ffmpeg transcoding on
otherwise idle system (a bunch of system daemons in background).
And I see that the cpu-consuming process frequently goes back and forth between my
two cores.  CPU user loads on the cores are something like 60% vs 40%.
My expectations were that the process would mostly run on one core while the rest
of the threads would mostly run on the other.
I am not sure if that core switching really hurts performance and if there is
something wrong about it.  But somehow it seems 'counter-intuitive'.

> *Result*
> Machine: Core i7 920 (42.56-44.8Gflops) / DDR3 1066
> OS: FreeBSD 8.0/amd64 and Ubuntu 9.10
> GotoBLAS2: 1.13
> 
> dgemm result
> OS      : FLOPS           : percent in peak
> FreeBSD : 32.0 GFlops     : 71%
> Ubuntu  : 42.0-42.7GFlops : 93.8%-95.3%

It would also be get good to learn more about your program.
How much memory does it typically use, how does it allocate it?
Is it single-threaded or not?  If not, how many threads does it have and what do
they do, how do they communicate?

-- 
Andriy Gapon


More information about the freebsd-stable mailing list