sx locks and memory barriers

Tue Sep 29 21:32:34 UTC 2009

On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
> 2009/9/29 Max Laier <max at love2party.net>:
> > On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
> >> 2009/9/25 Fabio Checconi <fabio at freebsd.org>:
> >> > Hi all,
> >> >  looking at sys/sx.h I have some troubles understanding this comment:
> >> >
> >> >  * A note about memory barriers.  Exclusive locks need to use the same
> >> >  * memory barriers as mutexes: _acq when acquiring an exclusive lock
> >> >  * and _rel when releasing an exclusive lock.  On the other side,
> >> >  * shared lock needs to use an _acq barrier when acquiring the lock
> >> >  * but, since they don't update any locked data, no memory barrier is
> >> >  * needed when releasing a shared lock.
> >> >
> >> > In particular, I'm not understanding what prevents the following sequence
> >> > from happening:
> >> >
> >> > CPU A                                   CPU B
> >> >
> >> > sx_slock(&data->lock);
> >> >
> >> > sx_sunlock(&data->lock);
> >> >
> >> > /* reordered after the unlock
> >> >   by the cpu */
> >> > if (data->buffer)
> >> >                                        sx_xlock(&data->lock);
> >> >                                        free(data->buffer);
> >> >                                        data->buffer = NULL;
> >> >                                        sx_xunlock(&data->lock);
> >> >
> >> >        a = *data->buffer;
> >> >
> >> > IOW, even if readers do not modify the data protected by the lock,
> >> > without a release barrier a memory access may leak past the unlock (as
> >> > the cpu won't notice any dependency between the unlock and the fetch,
> >> > feeling free to reorder them), thus potentially racing with an exclusive
> >> > writer accessing the data.
> >> >
> >> > On architectures where atomic ops serialize memory accesses this would
> >> > never happen, otherwise the sequence above seems possible; am I missing
> >> > something?
> >>
> >> I think your concerns are right, possibly we need this patch:
> >> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
> >>
> >> However speaking with John we agreed possibly there is a more serious
> >>  breakage. Possibly, memory barriers would also require to ensure the
> >>  compiler to not reorder the operation, while right now, in FreeBSD, they
> >>  just take care of the reordering from the architecture perspective.
> >> The only way I'm aware of GCC offers that is to clobber memory.
> >> I will provide a patch that address this soon, hoping that GCC will be
> >> smart enough to not overhead too much the memory clobbering but just
> >> try to understand what's our purpose and servers it (I will try to
> >> compare code generated before and after the patch at least for tier-1
> >> architectures).
> >
> > Does GCC really reorder accesses to volatile objects? The C Standard seems to
> > object:
> >
> > 5.1.2.3 - 2
> > Accessing a volatile object, modifying an object, modifying a file, or calling
> > a function that does any of those operations are all side effects,11) which
> > are changes in the state of the execution environment. Evaluation of an
> > expression may produce side effects. At certain specified points in the
> > execution sequence called sequence points, all side effects of previous
> > evaluations shall be complete and no side effects of subsequent evaluations
> > shall have taken place. (A summary of the sequence points is given in annex
> > C.)
> 
> Very interesting.
> I was thinking about the other operating systems which basically do
> 'memory clobbering' for ensuring a compiler barrier, but actually they
> often forsee such a barrier without the conjuction of a memory
> operand.
> 
> I think I will need to speak a bit with a GCC engineer in order to see
> what do they implement in regard of volatile operands.

GCC can be quite aggressive with reordering even in the face of volatile.  I
was recently doing a hack to export some data from the kernel to userland
that used a spin loop to grab a snapshot of the contents of a structure
similar to the method used in the kernel with the timehands structures.  It
used a volatile structure exposed from the kernel that looked something
like:

struct foo {
	volatile int gen;
	/* other stuff */
};

volatile struct foo *p;

do {
	x = p->gen;
	/* read other stuff */
	y = p->gen;
} while (x != y && x != 0);

GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
I eventually had to add explicit "memory" clobbers to force GCC to not
move the reads of 'gen' around but do them "around" all the other
operations, so that the working code is:

do {
	x = p->gen;
	asm volatile("" ::: "memory");
	/* read other stuff */
	asm volatile("" ::: "memory");
	y = p->gen;
} while (x != y && x != 0);

-- 
John Baldwin