Hyperthreading hurts 5.3?

Fri Jan 14 02:05:01 PST 2005

     On Thu, 13 Jan 2005 21:38:08 +0100 Andrea Venturoli
<ml.diespammer at netfence.it> wrote:

>Anthony Atkielski wrote:
>> Andrea Venturoli writes:
>> 
>> AV> Not exactly the same algorithm and on different set of data.
>> 
>> But similar machine instructions, perhaps?
>
>Yes, both numerical computations.
>Basically one thread would model geometry and the other would mesh it.
>Frequent stall would arise, as the two process would only by chance 
>require the same time, even so the two CPUs were always at full load 
>(!?!?!?). I also tried different combinations, e.g. three modelling 

     Makes sense.

>threads and one mesher with, again, equal timings.
>
>BTW, it's worth to mention, I *have* to use a compiler that knows 
>nothing about SSE or the like, so all is done with FPU instructions as 
>in the old 387s...
>
     That may make each thread take longer than if it could use the SSE
instructions, but is unrelated to your other issue.
>
>> Just the contention for the FPU alone might have had the effect of
>> single-threading the workload.
>
>I've come to the same conclusion. Still I can't put this together with 
>100% load on both processors. If, as someone said, there is only one 
>FPU, *how* are these figures coming out??? I would have expected 
>something like 50%-50% (instead of 100%-0% of the single threaded 
>version). *If* there is only one FPU, I'd expect both virtual processors 
>being frequently idle waiting for each other.
>
     They most likely are.  You seem to be forgetting that the "idling" in
question is handled within the CPU, not by the kernel.  In other words, it
happens effectively *during* an instruction on the "idled" core, not by kernel
processing between thread switches, so the instruction just takes longer than
it otherwise would, sort of like waiting for a memory access to complete.  No
interrupt occurs.  The core just sits and twiddles its electrons until the
resource it's queued upon becomes available to it, and then it proceeds to
complete the "idled" instruction.
>
>> That plus the SMP overhead might give
>> you a zero or negative gain with HT.
>
>I tried a multithreaded version on a UP machine (nonsense, I know): the 
>locking overhead is there, but very minimal: a process which takes 16 
>minutes will require, maybe, 3 seconds more.
>
     Was that using MPI?  Or some other thread management package?


                                  Scott Bennett, Comm. ASMELG, CFIAG
				  836 Greenbrier Road, Apt. 4
                                  DeKalb, Illinois 60115
**********************************************************************
* Internet:       bennett at cs.niu.edu                              *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************