HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)

Maho NAKATA chat95 at mac.com
Thu Apr 15 00:46:49 UTC 2010


Hi Andry and Adam

My test again. No desktop, etc. I just run dgemm.
Contrary to Adam's result, Hyper Threading makes the performance worse.
all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz)

Turbo Boost off, Hyper threading off: 82% (35GFlops)    [1]
Turbo Boost off, Hyper threading off: 72% (30.5GFlops)  [2]

Turbo Boost on,  Hyper threading on: 71% (32GFlops)    [3]
Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4]

---my system---
CPU: Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz (2683.44-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106a5  Stepping = 5
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x98e3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant
real memory  = 12884901888 (12288 MB)
avail memory = 12387717120 (11813 MB)
ACPI APIC Table: <110909 APIC1026>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s)
---my system---

---DETAILS---
[1]
% ./dgemm
n: 3000
time : 57.666717 or 16.339074 
Mflops : 33060.624827
n: 3100
time : 61.502677 or 16.597376 
Mflops : 35910.025544
n: 3200
time : 69.075401 or 19.199833 
Mflops : 34144.297133
n: 3300
time : 73.699540 or 19.633594 
Mflops : 36618.756539
n: 3400
time : 82.256194 or 22.373651 
Mflops : 35144.518837
n: 3500
time : 88.975662 or 24.118761 
Mflops : 35563.394249
n: 3600
time : 96.436652 or 26.027588 
Mflops : 35861.148385
n: 3700
[2]
% ./dgemm
n: 3000
time : 139.622739 or 17.693806 
Mflops : 30529.327312
n: 3100
time : 154.344971 or 19.566886 
Mflops : 30460.247702
n: 3200
time : 169.507739 or 21.467100 
Mflops : 30538.116602
n: 3300
time : 186.363773 or 23.615281 
Mflops : 30444.600545
n: 3400
time : 203.798979 or 25.817667 
Mflops : 30456.322788
n: 3500
...
[3]
% ./dgemm
n: 3000
time : 134.673079 or 16.958682 
Mflops : 31852.711082
n: 3100
time : 148.410085 or 18.663248 
Mflops : 31935.073574
n: 3200
time : 162.835473 or 20.468825 
Mflops : 32027.475770
n: 3300
time : 179.025370 or 22.479189 
Mflops : 31983.262501
n: 3400
time : 195.859710 or 24.663009 
Mflops : 31882.208788
n: 3500
[4]
% ./dgemm
n: 3000
time : 54.259647 or 14.684309 
Mflops : 36786.204907
n: 3100
time : 60.899147 or 17.124599 
Mflops : 34804.447141
n: 3200
time : 64.295342 or 17.490787 
Mflops : 37480.577569
n: 3300
time : 69.781247 or 18.288840 
Mflops : 39311.284796
n: 3400
time : 79.234397 or 21.829736 
Mflops : 36020.187858
n: 3500
time : 83.905419 or 22.381237 
Mflops : 38324.289174
n: 3600
time : 92.195022 or 25.105942 
Mflops : 37177.621122
n: 3700
time : 97.718841 or 25.434243 
Mflops : 39841.319494
n: 3800
time : 105.740463 or 27.414029 
Mflops : 40042.592613
n: 3900
time : 113.980157 or 29.678505 
Mflops : 39984.635420
n: 4000
time : 122.941569 or 31.946174 
Mflops : 40077.412531
n: 4100
---DETAILS---


From: Adam Vande More <amvandemore at gmail.com>
Subject: Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920
Date: Wed, 14 Apr 2010 11:34:45 -0500

>> > time : 162.488885 or 20.430651
>> > Mflops : 32087.318295
>> > n: 3300
>> > time : 178.497079 or 22.446093
>> > Mflops : 32030.420499
>> > n: 3400
>> > time : 195.550715 or 24.586152
>> > Mflops : 31981.873273
>> > n: 3500
>> > time : 213.403379 or 26.825058
>> > Mflops : 31975.513363
>> > n: 3600
>> > ...
>> > above output is on Core i7 920 (2.66GHz; TurboBoost on)
>>
>> My results:
>> $ ./dgemm
>> n: 3000
>> time : 54.151302 or 28.189781
>> Mflops : 19162.263125
>> n: 3100
>> time : 60.157449 or 32.214141
>> Mflops : 18501.570537
>> n: 3200
>> time : 65.753191 or 34.114872
>> Mflops : 19216.393378
>>
>> CPU:
>> CPU: Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz (2653.35-MHz K8-class
>> CPU)
>>  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
>>
>>
>> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>>
>>  Features2=0x8e39d<SSE3,DTES64,MON,DS_CPL,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
>>  AMD Features=0x20100800<SYSCALL,NX,LM>
>>  AMD Features2=0x1<LAHF>
>>  TSC: P-state invariant
>>>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>> FreeBSD/SMP: 1 package(s) x 2 core(s)
>>
>> FreeBSD:
>> FreeBSD 8.0-STABLE r205070 amd64
>>
>> Please note that the system was not dedicated to the test, I had
>> Xorg+KDE3+thunderbird+skype+kopete+konsole(s) plus a bunch of daemons
>> running.
>> That probably explains irregularities in the results.
>>
>> I am not sure how exactly theoretical maximum should be calculated, I used
>> 2 *
>> 2.66G * 4 ≈ 21.3G.
>> And so 19.2G / 21.3G ≈ 90%.
>>
>> Not as bad as what you get.
>> Although not as good as what you report for Linux.
>> But given the impurity and imprecision of my test…
>>
>> P.S. the machine is two-core obviously :-)
>> Don't have anything with more cpus/cores handy.


More information about the freebsd-stable mailing list