HyperThreading makes worse to me (was Re: How to reproduce: Re: Only 70% of theoretical peak performance on FreeBSD 8/amd64, Corei7 920)

Garrett Cooper yanefbsd at gmail.com
Thu Apr 15 07:52:19 UTC 2010


On Wed, Apr 14, 2010 at 9:21 PM, Ian Smith <smithi at nimnet.asn.au> wrote:
> On Wed, 14 Apr 2010, Garrett Cooper wrote:
>  > On Wed, Apr 14, 2010 at 7:49 PM, Garrett Cooper <yanefbsd at gmail.com> wrote:
>  > > On Wed, Apr 14, 2010 at 5:46 PM, Maho NAKATA <chat95 at mac.com> wrote:
>  > >> Hi Andry and Adam
>  > >>
>  > >> My test again. No desktop, etc. I just run dgemm.
>  > >> Contrary to Adam's result, Hyper Threading makes the performance worse.
>  > >> all tests are done on Core i7 920 @ 2.67GHz. (TurboBoost @2.8GHz)
>  > >>
>  > >> Turbo Boost off, Hyper threading off: 82% (35GFlops)    [1]
>  > >> Turbo Boost off, Hyper threading off: 72% (30.5GFlops)  [2]
>
> Er, shouldn't one of those say HTT on?  and/or Turbo boost on?  Else
> they're both the same test as [4] but with different results?

There's a problem with 8.x+ cores reported by the kernel. For some odd
reason more recent Intel processors aren't reporting themselves as
HT-enabled when they have HT-cores (see: kern/145385).

I didn't look into the issue too hard, but since it does seem to be a
major performance loss perhaps I should; besides, it would be good
experience to put under my belt :].

>  > >> Turbo Boost on,  Hyper threading on: 71% (32GFlops)    [3]
>  > >> Turbo Boost off, Hyper threading off: 84-89% (38-40GFlops) [4]
>
> Clarification of all four possible test configs - 8 if you add pinning
> CPUs or not - might make this a bit clearer?
>
>  > > Doesn't this make sense? Hyperthreaded cores in Intel procs still
>  > > provide an incomplete set of registers as they're logical processors,
>  > > so I would expect for things to be slower if they're automatically run
>  > > on the SMT cores instead of the physical ones.
>
> Since we're talking FP, do HTT 'cores' share an FPU, or have their own?
> If contended, you'd have to expect worse (at least FP) performance, no?

   Ah, that's another excellent point. What instructions is dgemm
using -- pure integer based arithmetic, floating point arithmetic,
specialized operations that would benefit from using SIMD, etc?

>  > > Is there a weighting scheme to SCHED_ULE where logical processors
>  > > (like the SMT variety) get a lower score than real processors do, and
>  > > thus get scheduled for less intensive interrupting tasks, or maybe
>  > > just don't get scheduled in high use scenarios like it would if it was
>  > > a physical processor?
>  >
>  > Err... wait. Didn't see that the turbo boost results didn't scale
>  > linearly or align with one another until just a sec ago. Nevermind my
>  > previous comment.
>
> Waiting for the fog to lift ..

    As am I. I don't know enough in this area, but I'm definitely open
to learning.

Thanks,
-Garrett


More information about the freebsd-stable mailing list