How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920

Adam Vande More amvandemore at gmail.com
Wed Apr 14 16:34:47 UTC 2010


On Wed, Apr 14, 2010 at 10:26 AM, Andriy Gapon <avg at freebsd.org> wrote:

> on 14/04/2010 02:21 Maho NAKATA said the following:
> > 4. run dgemm.
> > % ./dgemm
> > n: 3000
> > time : 134.648208 or 16.910525
> > Mflops : 31943.419695
> > n: 3100
> > time : 148.122279 or 18.615284
> > Mflops : 32017.357408
> > n: 3200
> > time : 162.488885 or 20.430651
> > Mflops : 32087.318295
> > n: 3300
> > time : 178.497079 or 22.446093
> > Mflops : 32030.420499
> > n: 3400
> > time : 195.550715 or 24.586152
> > Mflops : 31981.873273
> > n: 3500
> > time : 213.403379 or 26.825058
> > Mflops : 31975.513363
> > n: 3600
> > ...
> > above output is on Core i7 920 (2.66GHz; TurboBoost on)
>
> My results:
> $ ./dgemm
> n: 3000
> time : 54.151302 or 28.189781
> Mflops : 19162.263125
> n: 3100
> time : 60.157449 or 32.214141
> Mflops : 18501.570537
> n: 3200
> time : 65.753191 or 34.114872
> Mflops : 19216.393378
>
> CPU:
> CPU: Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz (2653.35-MHz K8-class
> CPU)
>  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
>
>
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>
>  Features2=0x8e39d<SSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
>  AMD Features=0x20100800<SYSCALL,NX,LM>
>  AMD Features2=0x1<LAHF>
>  TSC: P-state invariant
>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> FreeBSD/SMP: 1 package(s) x 2 core(s)
>
> FreeBSD:
> FreeBSD 8.0-STABLE r205070 amd64
>
> Please note that the system was not dedicated to the test, I had
> Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons
> running.
> That probably explains irregularities in the results.
>
> I am not sure how exactly theoretical maximum should be calculated, I used
> 2 *
> 2.66G * 4 ≈ 21.3G.
> And so 19.2G / 21.3G ≈ 90%.
>
> Not as bad as what you get.
> Although not as good as what you report for Linux.
> But given the impurity and imprecision of my test…
>
> P.S. the machine is two-core obviously :-)
> Don't have anything with more cpus/cores handy.
>
> P.P.S. Having _only glimpsed_ at the source I think that there are some
> things
> that GotoBLAS doesn't try to do on FreeBSD that it tries to do on Linux.
> Like setting CPU-affinity for the threads, or avoiding HTT pseudo-cores.
> Those things are possible on FreeBSD.
> Perhaps, there are more things like that.
>
>
Mine is also a live desktop enviro, kde4+

n: 3000
time : 116.377609 or 16.696066
Mflops : 32353.729042
n: 3100
time : 127.230336 or 17.274867
Mflops : 34501.695325
n: 3200
time : 139.018175 or 18.342056
Mflops : 35741.074976
n: 3300
time : 152.519365 or 20.154714
Mflops : 35671.942364
n: 3400
time : 166.248145 or 21.952426
Mflops : 35818.874941
n: 3500
time : 182.565385 or 24.492597
Mflops : 35020.581786
n: 3600
time : 198.551018 or 26.906992
Mflops : 34689.094992
n: 3700
time : 215.428919 or 28.574964
Mflops : 35462.294838
n: 3800
^C

CPU: Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz (3313.71-MHz K8-class
CPU)
  Origin = "GenuineIntel"  Id = 0x106e5  Family = 6  Model = 1e  Stepping =
5

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant

That's about 67% utilization, turning off HTT drops it more.  HTT on the
newer cores is good, not bad.





-- 
Adam Vande More


More information about the freebsd-stable mailing list