atomic_load_acq @ i386/amd64

Tue Jan 7 05:50:35 UTC 2014

On Sun, Jan 05, 2014 at 03:29:10AM +0400, Oleg Bulyzhin wrote:
> On Sat, Jan 04, 2014 at 07:29:23PM +0200, Konstantin Belousov wrote:
> > On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote:
> > > 
> > > Hello.
> > > 
> > > I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs
> > > with locked cmpxchg instruction? Comment about this
> > > (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe
> > > acquire/release semantics does not require StoreLoad barrier so simple aligned
> > > load should be enough. (because acquire/release semantics does not guarantee
> > > sequential consistency).
> > 
> > You did not explicitely wrote which statement in the comment is false, in
> > your opinion.
> 
> > 
> > FreeBSD assumes a property of _acq/_rel stuff which is sometimes called
> > 'total lock ordering'. It is indeed sort of sequential consistency, but
> > only for atomic+membar ops. Would atomic_load_acq()  implemented as plain
> > load, it can pass stores, in particular stores from the _rel op, which
> > breaks the guarantee.
> > 
> > For x86, there are indeed two possible schemes for implementing critical
> > section, one is lock cmpxchg for get(), and plain store for release(),
> > which is what we use. Another is plain load for get(), and xchg for
> > release().  Then, the load_acq() must be adopted to not break the acq/rel
> > consistency, and since we use plain store for release(), load_acq must
> > use serialing instruction.
> 
> Perhaps i was not clear enough, i'm talking about this one:
> "However, loads may pass stores, so for atomic_load_acq we have to
>  ensure a Store/Load barrier to do the load in SMP kernels."
> 
> As far as i know acquire/release semantics guarantees following:
> if we have this code
> <prev_code>
> _acq
> <some code>
> _rel
> <post_code>
> 
> following statements are true:
> 1) <some code> cannot leave (due to reordering) acq/rel block
> 2) <prev_code> may leak past _acq 
> 3) <post_code> may leak before _rel
> So neither _acq nor _rel requires full membar. I.e.
> op_acq is:
> <op>
> <one way membar, down->up reordering is prohibited>
> op_rel is:
> <one way membar, up->down reordering is prohibited>
> <op>
> 
> Intel documentation says about only thing (for simple load/stores) can be
> reordered: "Reads may be reordered with older writes to different locations
> but not with older writes to the same location."
> 
> So, if older store can pass our load_acq() it would not break requirements.
> And i do not understand how load op from load_acq() can pass store op from
> store_rel(), intel doc says: "Writes are not reordered with older reads". 
Please re-read what I wrote above about 'total lock ordering'.

> 
> Well, while writing this email i realized what is disturbing me: it's atomic(9)
> "Multiple Processors" section. It claims atomics are not atomic in common MP
> case and says atomics are atomic @i386. It looks strange for me:
> 1) i guess it's not "atomic" even for i386/MP without proper membar pairing.
> 2) if we have acq/rel modifiers for atomics why we cannot guarantee "atomicity"
>    for any MP arch?
> 
> P.S. please correct me if i'm wrong in my statements, i'm spending my new year
> holidays for ignorance elimination. ;)

I do not know what do you mean by 'not atomic'.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20140107/878c93a2/attachment.sig>