sx locks and memory barriers

Sat Oct 3 20:45:37 UTC 2009

2009/9/29 John Baldwin <jhb at freebsd.org>:
> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
>> 2009/9/29 Max Laier <max at love2party.net>:
>> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>> >> 2009/9/25 Fabio Checconi <fabio at freebsd.org>:
>> >> > Hi all,
>> >> >  looking at sys/sx.h I have some troubles understanding this comment:
>> >> >
>> >> >  * A note about memory barriers.  Exclusive locks need to use the same
>> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>> >> >  * and _rel when releasing an exclusive lock.  On the other side,
>> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
>> >> >  * but, since they don't update any locked data, no memory barrier is
>> >> >  * needed when releasing a shared lock.
>> >> >
>> >> > In particular, I'm not understanding what prevents the following sequence
>> >> > from happening:
>> >> >
>> >> > CPU A                                   CPU B
>> >> >
>> >> > sx_slock(&data->lock);
>> >> >
>> >> > sx_sunlock(&data->lock);
>> >> >
>> >> > /* reordered after the unlock
>> >> >   by the cpu */
>> >> > if (data->buffer)
>> >> >                                        sx_xlock(&data->lock);
>> >> >                                        free(data->buffer);
>> >> >                                        data->buffer = NULL;
>> >> >                                        sx_xunlock(&data->lock);
>> >> >
>> >> >        a = *data->buffer;
>> >> >
>> >> > IOW, even if readers do not modify the data protected by the lock,
>> >> > without a release barrier a memory access may leak past the unlock (as
>> >> > the cpu won't notice any dependency between the unlock and the fetch,
>> >> > feeling free to reorder them), thus potentially racing with an exclusive
>> >> > writer accessing the data.
>> >> >
>> >> > On architectures where atomic ops serialize memory accesses this would
>> >> > never happen, otherwise the sequence above seems possible; am I missing
>> >> > something?
>> >>
>> >> I think your concerns are right, possibly we need this patch:
>> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>> >>
>> >> However speaking with John we agreed possibly there is a more serious
>> >>  breakage. Possibly, memory barriers would also require to ensure the
>> >>  compiler to not reorder the operation, while right now, in FreeBSD, they
>> >>  just take care of the reordering from the architecture perspective.
>> >> The only way I'm aware of GCC offers that is to clobber memory.
>> >> I will provide a patch that address this soon, hoping that GCC will be
>> >> smart enough to not overhead too much the memory clobbering but just
>> >> try to understand what's our purpose and servers it (I will try to
>> >> compare code generated before and after the patch at least for tier-1
>> >> architectures).
>> >
>> > Does GCC really reorder accesses to volatile objects? The C Standard seems to
>> > object:
>> >
>> > 5.1.2.3 - 2
>> > Accessing a volatile object, modifying an object, modifying a file, or calling
>> > a function that does any of those operations are all side effects,11) which
>> > are changes in the state of the execution environment. Evaluation of an
>> > expression may produce side effects. At certain specified points in the
>> > execution sequence called sequence points, all side effects of previous
>> > evaluations shall be complete and no side effects of subsequent evaluations
>> > shall have taken place. (A summary of the sequence points is given in annex
>> > C.)
>>
>> Very interesting.
>> I was thinking about the other operating systems which basically do
>> 'memory clobbering' for ensuring a compiler barrier, but actually they
>> often forsee such a barrier without the conjuction of a memory
>> operand.
>>
>> I think I will need to speak a bit with a GCC engineer in order to see
>> what do they implement in regard of volatile operands.
>
> GCC can be quite aggressive with reordering even in the face of volatile.  I
> was recently doing a hack to export some data from the kernel to userland
> that used a spin loop to grab a snapshot of the contents of a structure
> similar to the method used in the kernel with the timehands structures.  It
> used a volatile structure exposed from the kernel that looked something
> like:
>
> struct foo {
>        volatile int gen;
>        /* other stuff */
> };
>
> volatile struct foo *p;
>
> do {
>        x = p->gen;
>        /* read other stuff */
>        y = p->gen;
> } while (x != y && x != 0);
>
> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
> I eventually had to add explicit "memory" clobbers to force GCC to not
> move the reads of 'gen' around but do them "around" all the other
> operations, so that the working code is:
>
> do {
>        x = p->gen;
>        asm volatile("" ::: "memory");
>        /* read other stuff */
>        asm volatile("" ::: "memory");
>        y = p->gen;
> } while (x != y && x != 0);

The situation was not so desperate as I first thought.
Actually, only ia32 and amd64 seems affected by the missing of memory
clobbering because it is already done for all the other platform when
using all the memory barriers.
On ia32 and amd64 too, the impact is more limited than what I
expected. atomic_cmpset_* already clobbers the memory and only
atomic_store_rel_* is not adeguately protected among the atomics used
in our locking primitives, thus I would really expect a limited
performance penalty if any.
What was not really protected where the functions defined through
ATOMIC_ASM() and that was the larger part to fix.

I spoke briefly about the compiler support with Christian Lattner
(@Apple, LLVM leader) and he mentioned he was aware of the aggressive
behaviour of GCC pointing me in the direction that what the C Standard
really mandates is that read/write operations with non-volatile
operands can be reordered across a volatile operand but that the
read/write of volatile operands are strong ordered in regard of
eachother. This however means that we have to fix the 'memory clobber'
for GCC-simil compilers and also offer a support for the other that
let them specify a memory barrier.
I then wrote this patch:
http://www.freebsd.org/~attilio/atomic.h.diff

That should address all the concern raised and also forsee the
possibility for other compiler to specify memory barriers semantics
differently from normal atomic.

rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems.

I still didn't compare pickly the produced binaries, but I see a
little increase in the binary sizes probabilly caming from the .text
growing.

Attilio

-- 
Peace can only be achieved by understanding - A. Einstein