When is it worth enabling hyperthreading?

Fri Oct 9 07:20:30 UTC 2009

     On Wed, 07 Oct 2009 23:24:48 -0400 Pierre-Luc Drouin
<pldrouin at pldrouin.net> wrote:
>Could someone explain me in which cases it is useful to enable 
>hyperthreading on a machine running FreeBSD 8.0 and in which other cases 
>it is not a good idea? Is that possible that hyperthreading is 
>disadvantageous unless the number of active (non-sleeping) threads is 
>really high?
>
>For example, if I have an i7 CPU with 4 physical cores and that I run 
>some multi-threaded code that has only 4 threads, it will run almost 
>always (twice) slower with hyperthreading enabled than when I disable it 
>in the BIOS. If I understand correctly, hyperthreading has the advantage 
>of being able to do CPU context switching faster than the OS, but it 

     No.  Both context execute simultaneously.  Each logical CPU of the
two logical CPUs in a core has its own set of registers, LDT and GDT
pointer registers, and instruction counter.  Both compete for the same
remaining set of resources:  DAT, TLB, FPU, cache (all levels for a given
core), busses to off-chip resources, and--most critically--pipeline slots
per clock cycle.  Any time a resource shared by the two logical CPUs (what
the logical CPUs execute are called "CPU threads" or "hyperthreads") is
in use by one logical CPU, it is unavailable for use by the other logical
CPU.  If a logical CPU needs a resource unavailable due to its being in
use by the other logical CPU, the late-comer's processing is suspended
until the resource is released by the other logical CPU.  Such a lockout
situation is not directly detectable in software because the locked-out
instruction is still in execution; it's just taking more than the usual
number of cycles to complete.
     On a P4 Prescott chip or the late models of single-cored Xeons,
the pipeline structure is apparently less than ideal for sustained
simultaneous execution; i.e., there are frequent pairings of instructions
that require more than the available pipeline slots of the types required
by the two parallel instructions, which causes one of them to spin until
the other moves on, opening the next cycle's set of pipeline slots.  A
simple case can demonstrate the problem, although on most systems this
example would likely be infrequent.  There is only one FPU pipeline on
these chips, so two floating-point instructions executing simultaneously
will result in one getting the FPU pipeline slot for the current cycle,
while the other one spins until the next cycle, whereupon the other side
will spin, etc.  What is actually the more common occurrence is that
other types of instruction pairs will require, for example, four slots
of a type that only has three pipelines.
     The Core i7 chips (don't know about the other Core iN series) are
alleged to have an improved assortment of pipelines w.r.t. typical
instruction mixes, although I think there is still only one FPU per core,
so the parallelism is supposed to be rather more effective on these chips
than on their forerunners in the Pentium/Xeon series.  It has been quite
a while since I last tried measuring it, but IIRC, a "make buildworld"
on my 3.4 GHz P4 Prescott takes about one to two minutes longer elapsed
time in non-hyperthreading mode with MAKEFLAGS set to "-j3" than it does
with hyperthreading enabled and MAKEFLAGS set to "-j5" (i.e., something
like 52 - 53 minutes instead of 51 minutes and a few seconds).
     Your quad-core Core i7 chips ought to provide a much greater benefit
with hyperthreading enabled, relatively speaking.  The traditional
recommendation for the -j flag for make(1) is 3*nCPUs, but hyperthreading
doesn't give you a full CPU's worth of extra processing, so your quad-core
chips won't give you a full 8 CPUs' worth.  In other words, a single,
large, parallel make job probably should have -j set to something under
24 yet still greater than 12, as a guess perhaps 20ish. :-)  But do try it
yourself at different -j values, and let us know how your timings turn out
on that chip, along with the model number of the chip.

>does this context switching systematically instead of only when 
>requested, so it slows things down unless the number of running 
>(non-sleeping) threads is greater or equal to let say the number of 
>physical threads x 1.5-1.75.
>
     In general, there is a slight gain, although running parallel
floating-point activities is a break-even situation and not worth the
bother unless you're just trying to learn OpenMP or some such.  When I've
disabled hyperthreading, interactive response has often seemed a tad less
snappy when running some CPU-bound process at the same time.  OTOH, with
hyperthreading enabled, I sometimes notice a bit more jerkiness in things
like scrolling in firefox, but it's not easy to tell what's really happening
there because firefox typically has at least 7 threads itself. :-)  Like
Bill Moran said, user interfaces do seem a bit more responsive, and I
haven't seen any noticeable *loss* in overall performance.  The "make
buildworld" example runs long enough to give some idea, and it always runs
a little bit faster under hyperthreading than in uniprocessor mode.  A
"make buildkernel" also shows a bit of improvement.  I've never seen any
dramatic improvement, but the slight improvement is sometimes apparent.
Also, when running Windows XP, having hyperthreading enabled has allowed me
to get out from under some runaway, single-threaded process, even though
doing so can take a while because the runaway process does compete vigorously
for the shared resources discussed above. :-)  Nevertheless, without the
extra logical CPU, a manual reboot would have been necessary to regain
control of the machine.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:       bennett at cs.niu.edu                              *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************