Possible instruction pipelining problem between HT's on the
same die ? (fwd)
Keir Fraser
Keir.Fraser at cl.cam.ac.uk
Sat Jun 4 08:21:33 GMT 2005
Hi,
I did a fair amount of lock-free programming during my PhD and for Xen,
so I may be able to shed some light on this situation. OTOH I may also
be confused: the x86 memory model is poorly specified and the reference
manuals are often badly written and misleading. I'll address the points
and questions out of order....
> But I'm beginning to think that it isn't working as advertised.
> I've
> read the manuals over and over again and they seem to only
> guarentee
> write ordering between physical cpus, not between logical HT cpus,
> and
> even then it appears that a cpu can do a speculative read and
> thus get an old value for A even after getting a new value for B.
The ordering guarantees between HTs are identical to those between
physical cpus. I'm referring to Section 7.6.19 of IARM (Intel IA-32
Reference Manual) Vol 3. It's slightly confusing that it says "can
further be defined as 'write-ordered with store buffer forwarding'" but
this forwarding only occurs separately *within* each logical cpu (the
store buffer is statically partitioned between the two HTs), and this
phrase is identical to the one describing physical cpu behaviour in
Section 7.2.2 (ie. it is redundant to reiterate it in this later
section).
Reads can be speculatively executed out-of-order, but this property
isn't unique to HTs. This race could in theory happen across physical
cpus.
> Now I was depending on the presumed write ordering, so if a foreign
> cpu sees that B is updated it can assume that A has also been
> updated.
You *can* depend on write ordering. But this ordering is no help if
CPU#1 has already executed, and is retiring, the read from A by the
time it executes the read from B. It's CPU#1 that is screwing up, not
CPU#0.
> I looked at the various SFENCE/LFENCE/MFENCE instructions and they
> do not seem to guarentee ordering for speculative accesses at all.
> They all say that they do not protect against speculative reads.
> Bus-locked instructions don't seem to avoid speculative reads
> either.
I think the reference manual is being almost wilfully misleading by
referring to the speculative prefetch mechanism and its total
independence from the fence instructions: "data could be speculatively
loaded into the cache just before, during, or after the execution of an
MFENCE instruction". It is important to realise that speculative
execution of a memory-reading instruction is quite different from
speculative prefetch into a cache. The latter should not matter to the
programmer: the cache coherency protocol hides it. Consider the code
example in the original email:
> cpu #0 write A
> write B
>
> (HT)cpu #1 read B
> if (B)
> read A <---- gets OLD data in A, not new data
If CPU#1 prefetches A into its cache before it reads B, it may indeed
see the old value of A; *but* when CPU#0 writes A it will invalidate
that cacheline in all remote caches; *furthermore* CPU#0 cannot commit
its update of B until after it has committed its update of A (x86
guarantees write order). So, if CPU#1 reads the new value of B, then
any stale value of A in its cache has been invalidated by that point.
All you need to ensure is that CPU#1 hasn't speculatively executed the
read from A: precisely the purpose of MFENCE and LFENCE.
This is more complicated if both CPUs are sharing their memory
hierarchy. However, either cache lines are tagged with an HT identifier
and so the cache logically operates as two separate variable-sized
caches (in which case normal cache coherency rules apply as described
above), or there is true cacheline sharing (in which case there is no
stale data to worry about, as CPU#0 will directly update the cache data
that CPU#1 will read from). Either way, there's no weakening of the
memory model.
-- Keir
More information about the freebsd-hackers
mailing list