atomicity of unlocked reads

Wed Sep 17 14:35:05 PDT 2003

On Wed, Sep 17, 2003 at 03:34:53PM -0400, John Baldwin wrote:
> 
> No, that is not what an acquire load is for.  Memory barriers only
> affect the order of memory operations on the current processor, they
> have no bearing on other processors and don't provide any direct
> synchronization with other processors.  Instead, if I have an acquire
> barrier, then the processor is not allowed to re-order any later reads
> or writes before the marked read.  Note that earlier reads or writes
> can be re-ordered after the marked read.  A release barrier is the
> opposite in that prior reads/writes must be completed prior to the
> marked write, but later reads/writes may be re-ordered before the
> marked write.
> 

Umm, this is not the way that the computer architecture community
defines acquire and release accesses.  They do, in fact, play a role
in defining the (partial) order in which memory accesses are seen by
the different processors within a system.  Specifically, I would refer
you to Condition 3.1 in Gharachorloo et al. ("Memory Consistency and
Event Ordering in Scalable Shared-Memory Multiprocessors" at
http://citeseer.nj.nec.com/gharachorloo90memory.html).  Note
particularly, the phrasing "... perform with respect to any other
processor."  This was the paper that introduced the Release
Consistency model, and the notion of acquire and release accesses.

The rest of what you say about the effects of acquire and release
accesses on re-ordering within a processor is basically correct. 

> Bruce explicitly said that if he reads a stale value, that is ok,
> so the membars don't do anything for him. If he is worried about
> stale data, then he needs a lock, not just atomic operations.

Not necessarily.  My previous message addressed this point.

> ...  The
> way that a lock works is that when we try to acquire a lock, we
> use an acquire barrier.  This means that later reads/writes in the
> instruction stream won't be re-ordered before the lock acquire.
> When we release the lock we use a release barrier to ensure that
> any modifications made while holding the lock will be visible
> before the write to release the lock is visible.  Thus, you can
> have CPU A acquire lock L, make a few writes, and then release
> lock L.  If CPU B tries to acquire lock L after A has released
> it but before the write releasing the lock is visible to B, B will
> end up spinning (see the MTX_CONTESTED flag in the mutex code)
> until that write is visible (unless another thread has already
> blocked on this lock, in which case B will just block right away)
> until the write releasing L is visible to B.  B can then acquire
> lock L.  Since it had to wait for L's release write to be visible,
> this means that all the writes A performed are now visible to B,
> and thus B will not read stale data.
> 
> Thus, memory barriers don't actually enforce any synchronization,
> they just give you a tool that can be used in conjunction with a
> memory location to construct a lock primitive that enforces
> sychronization.

I agree with this statement.  Atomicity, synchronization, and memory
ordering are three distinct concepts. 

A few years back a former colleague of mine, Sarita Adve, and Kourosh
Gharachorloo wrote a survey paper for IEEE Computer on this topic.  See
http://citeseer.nj.nec.com/adve95shared.html.  (And, yes, I'm the Cox
who appears in the "Related documents from co-citation" section
partway down that web page.  I used to work in a related area.)

Alan