atomic ops

Tue Oct 28 14:33:09 UTC 2014

On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew at fubar.geek.nz> wrote:
> On Tue, 28 Oct 2014 14:18:41 +0100
> Attilio Rao <attilio at freebsd.org> wrote:
>
>> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik at gmail.com>
>> wrote:
>> > As was mentioned sometime ago, our situation related to atomic ops
>> > is not ideal.
>> >
>> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
>> > full memory barriers, which is stronger than needed.
>> >
>> > Moreover, load is implemented as lock cmpchg on var address, so it
>> > is addditionally slower especially when cpus compete.
>>
>> I already explained this once privately: fully memory barriers is not
>> stronger than needed.
>> FreeBSD has a different semantic than Linux. We historically enforce a
>> full barrier on _acq() and _rel() rather then just a read and write
>> barrier, hence we need a different implementation than Linux.
>> There is code that relies on this property, like the locking
>> primitives (release a mutex, for instance).
>
> On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added
> support for load-acquire and store-release atomic instructions. For the
> use in atomic instructions we can assume these only operate of the
> address passed to them.
>
> It is unlikely we will use them in the 32-bit port however I would like
> to know the expected semantics of these atomic functions to make sure
> we get them correct in the arm64 port. I have been advised by one of
> the ARM Linux kernel maintainers on the problems they have found using
> these instructions but have yet to determine what our atomic functions
> guarantee.

For FreeBSD the "reference doc" is atomic(9).
It clearly states:

The second variant of each operation includes a read memory barrier.
This barrier ensures that the effects of this operation are completed
before the effects of any later data accesses.  As a result, the opera-
tion is said to have acquire semantics as it acquires a pseudo-lock
requiring further operations to wait until it has completed.  To denote
this, the suffix ``_acq'' is inserted into the function name immediately
prior to the ``_<type>'' suffix.  For example, to subtract two integers
ensuring that any later writes will happen after the subtraction is per-
formed, use atomic_subtract_acq_int().

The third variant of each operation includes a write memory barrier.
This ensures that all effects of all previous data accesses are completed
before this operation takes place. As a result, the operation is said to
have release semantics as it releases any pending data accesses to be
completed before its operation is performed.  To denote this, the suffix
``_rel'' is inserted into the function name immediately prior to the
``_<type>'' suffix.  For example, to add two long integers ensuring that
all previous writes will happen first, use atomic_add_rel_long().

The bottom-side of all this is that read memory barriers ensures that
the effect of the operations you are making (load in case of
atomic_load_acq_int(), for example) are completed before any later
data accesses. "Data accesses" qualifies for *all* the operations
including read, writes, etc. This is very different by what Linux
assumes for its rmb() barrier, for example which just orders loads. So
for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
wmb() analogy.

This must be kept well in mind when trying to optimize the atomic_*()
operations.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein