Possible instruction pipelining problem between HT's on the
same die ?
Stephan Uphoff
ups at tree.com
Sat Jun 4 02:29:06 GMT 2005
On Fri, 2005-06-03 at 18:47, Matthew Dillon wrote:
> :This is normal behaviour.
> :Take a look at IA-32 Intel Developers ... Vol 3,
> :Section: 7.2.2 for details + solutions.
> :
> :Stephan
>
> Ok.. that section seems to indicate that speculative reads
> can pass writes, but it also says that the pipeline sniffs the address
> within the processor and ensures proper ordering. The latter part
> makes sense within the context of a single cpu, but the big question is:
> Is that supposed to hold true for interactions with HT cpus (that share
> the pipeline) as well? Or not ? It seems not.
Memory ordering in logical HT CPUs is the same as in real CPUs (see
7.6.1.9)
>
> Speculative reads creating out of order situations seems to be the
> biggest issue. The AMD manual (Programmers manual volume 3 page
> 186, MFENCE instruction) says this:
>
> "The MFENCE instruction is weakly-ordered with respect to data and
> instruction prefetches. Speculative loads initiated by the processor,
> or specified explicitly using cache-prefetch instructions, can be
> reordered around an MFENCE".
Speculative loads can pass MFENCE - but can not pass load operations
issued before MFENCE.
> This seems to be different then what the Intel manual says, and doesn't
> make much sense. What's the point of having a fence instruction if it
> can't guarentee read/write ordering? Is the AMD manual simply wrong ?
Not wrong - just confusing.
READ A
MFENCE
READ B
can cause
READ A
Speculative READ B
MFENCE
but NOT
Speculative READ B
READ A
MFENCE
> Other then that, the Intel manual does indicate that speculative reads
> will not pass locked bus cycle instructions (the AMD manual says nothing
> about that that I can see).
AMD Volume 1 - 3.9.2
> So, presumably, doing a dummy locked bus
> cycle operation on e.g. the top of the stack, such as Linux does, would
> be sufficient to ensure read ordering. Would you concur with that
> assessment?
Yes
> What's really horrible here is that the 'old' value of the data being
> used is modified at location A something like 30 instructions prior to
> the instruction that updates the index (B). I think this is a
> situation that can only occur in an HT configuration, and then only if
> the speculative read issued by the HT cpu is being held for across
> 30 instructions executed by the primary cpu before the HT cpu issues the
> read of B.
>
> cpu #0 cpu #1 (HT cpu on same die as cpu #0)
>
> speculatively read A
> write A (stalled)
> [30 instructions] (stalled x 30)
> write B (stalled)
> read B
> see that B has been updated
> read A (get old value for A instead of new)
>
> Is that even possible ? Not only the 30 instruction latency, but also
> the fact that even with the shared pipeline you have a speculative read
> on the HT cpu surviving 30 instructions running on cpu #0 (but only one
> or two on the HT cpu)... even though they share the same pipeline.
Take a look at store buffers.
Reads have a higher priority than writes on some CPUs and data may be
even stored indefinitely long in a store buffer.
( Where it can not be observed by other CPUs)
Reading some of the Intel and AMD errata gives you a good picture.
Stephan
More information about the freebsd-hackers
mailing list