Hyperthreading hurts 5.3?

Wed Jan 12 13:29:55 PST 2005

     On Wed, 12 Jan 2005 18:45:56 +0100 Anthony Atkielski
<atkielski.anthony at wanadoo.fr> wrote:

>Scott Bennett writes:
>
>SB>      Well, no, not exactly.  The dual-cored CPUs share certain resources
>SB> on the chip that are not shared in a multi-CPU situation, and that sharing
>SB> means certain operations have to be handled differently.  An MP setup has
>SB> separate cache and TLB managment in each CPU ...
>
>What's TLB?

     Translation Lookaside Buffer.  (I know.  It's a weird name.  I think
its origin was at IBM, which would explain the weirdness completely.)  Its
a collection of registers that contains the results of the most recent
address translations.  These results are kept around in order to avoid
going through the full address translation process for addresses in pages
for which address translation has already occurred.  It's a big time-saver.
When a virtual address is encountered that isn't in the TLB, then address
translation proceeds from scratch, and the result replaces the least recently
used entry in the TLB.
>
>SB> ... whereas P4 w/HT logical processors share this memory management
>SB> circuitry. Alteration of a cache line requires notification of the
>SB> other processor(s) in an MP situation to mark any corresponding line
>SB> in its(their) cache(s) because multiple separate caches are
>SB> involved, but notification is not necessary in the P4 w/HT situation
>SB> because it's the same cache being seen by both logical processors.
>SB>      Alteration/invalidation of TLB entries requires notification to
>SB> invalidate in an MP, so that the other CPU(s) can purge any corresponding TLB
>SB> entries it(they) may have, but notification is not required in the P4 w/HT
>SB> situation because both logical processors are refering to the same TLB.  Again,
>SB> unnecessary purging would be a performance hit.
>SB>      There must be some special handling of TLB entries in the P4 w/HT that
>SB> I haven't seen documented.  (There almost certainly is documentation; I just
>SB> haven't seen it yet.)  There must be some way to distinguish between TLB
>SB> entries filled per orders of one logical processor from those filled per
>SB> orders of the other logical processor.  If there weren't, then one logical
>SB> processor would use TLB entries for the address space running on the other
>SB> logical processor, which would, of course, be Very Bad.  But, to improve
>SB> performance, there should be some way to share TLBs for the case of two
>SB> threads running concurrently in the same address space.  If anyone reading
>SB> this knows the details of how this is handled in these chips, please post them
>SB> here.
>
>>From what you say and from what I've read today, it sounds like
>hyperthreading comes close to providing two separate processors for
>heterogenous system loads (where each hyperthread is using slightly
>different processor resources at any given instant), but it may not buy
>much of anything for massively parallel compute-bound work, since both
>threads may want nearly the same things at the same time and will thus
>effectively be forced to spend a lot of time waiting for each other.

     I think that's probably close to the truth.  For logic and number
crunching, the two logical processors can proceed in parallel.  But they
will compete for any non-register memory access, for address translation
time, and possibly other resources.  I notice that the 5.2.1 boot messages
refer to the second core as an AP, which I'm guessing stands for "attached
processor".  If that guess is correct, then it means that only the first
core is able to perform certain functions, and the AP core has to get the
first core to do those things for it when it needs them done.  Typically,
such restricted functions include things like starting I/O operations,
handling I/O interrupts, setting the system clock, etc.  Whether these
restrictions are the actual ones, if there are any at all, in this
situation, I do not know.
>
>Fortunately, my server has a very mixed load, as one would expect for a
>generic domain server, so hopefully it will profit from hyperthreading.
>
     What Intel claims is essentially that the HT-enabled CPUs allow
snappier responses in interactive processes when a CPU-bound process is
running.

>And hopefully no weird stuff will happen because I've turned on HT
>(although offhand I'm not sure what would happen, unless there are
>hidden hardware conflicts or something specific and software-visible
>about HT in normal operation that might expose a bug).  I'm not sure
>that I see how HT could affect Serial ATA disks, for example, any more
>than having two separate physical processors would.
>


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:       bennett at cs.niu.edu                              *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************