How to reproduce: Re: Only 70% of theoretical peak performance
on FreeBSD 8/amd64, Corei7 920
Andriy Gapon
avg at freebsd.org
Wed Apr 14 15:26:51 UTC 2010
on 14/04/2010 02:21 Maho NAKATA said the following:
> 4. run dgemm.
> % ./dgemm
> n: 3000
> time : 134.648208 or 16.910525
> Mflops : 31943.419695
> n: 3100
> time : 148.122279 or 18.615284
> Mflops : 32017.357408
> n: 3200
> time : 162.488885 or 20.430651
> Mflops : 32087.318295
> n: 3300
> time : 178.497079 or 22.446093
> Mflops : 32030.420499
> n: 3400
> time : 195.550715 or 24.586152
> Mflops : 31981.873273
> n: 3500
> time : 213.403379 or 26.825058
> Mflops : 31975.513363
> n: 3600
> ...
> above output is on Core i7 920 (2.66GHz; TurboBoost on)
My results:
$ ./dgemm
n: 3000
time : 54.151302 or 28.189781
Mflops : 19162.263125
n: 3100
time : 60.157449 or 32.214141
Mflops : 18501.570537
n: 3200
time : 65.753191 or 34.114872
Mflops : 19216.393378
CPU:
CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2653.35-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x10676 Stepping = 6
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x8e39d<SSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
AMD Features=0x20100800<SYSCALL,NX,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant
⋮
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
FreeBSD:
FreeBSD 8.0-STABLE r205070 amd64
Please note that the system was not dedicated to the test, I had
Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons running.
That probably explains irregularities in the results.
I am not sure how exactly theoretical maximum should be calculated, I used 2 *
2.66G * 4 ≈ 21.3G.
And so 19.2G / 21.3G ≈ 90%.
Not as bad as what you get.
Although not as good as what you report for Linux.
But given the impurity and imprecision of my test…
P.S. the machine is two-core obviously :-)
Don't have anything with more cpus/cores handy.
P.P.S. Having _only glimpsed_ at the source I think that there are some things
that GotoBLAS doesn't try to do on FreeBSD that it tries to do on Linux.
Like setting CPU-affinity for the threads, or avoiding HTT pseudo-cores.
Those things are possible on FreeBSD.
Perhaps, there are more things like that.
--
Andriy Gapon
More information about the freebsd-stable
mailing list