what goes wrong with barrier free atomic_load/store?

Wed Apr 20 15:13:16 PDT 2005

On Wed, 2005-04-20 at 16:39, John Giacomoni wrote:
> in reading /src/sys/i386/include/atomic.h
> 
> I found this comment and I'm having trouble understanding what the
> problem being
> referred to below is.
> 
> /*
>   * We assume that a = b will do atomic loads and stores.  However, on a
>   * PentiumPro or higher, reads may pass writes, so for that case we have
>   * to use a serializing instruction (i.e. with LOCK) to do the load in
>   * SMP kernels.  For UP kernels, however, the cache of the single
> processor
>   * is always consistent, so we don't need any memory barriers.
>   */
> 
> can someone give me an example of a situation where one needs to use
> memory barriers to ensure "correctness" when doing writes as above?

volatile int status = NOT_READY;
volatile int data = -1;

Thread 1: (CPU 0)
----------
data = 123;
status = READY;

Thread 2: (CPU 1)
---------
if (status == READY) {
	my_data = data;	
}

Read reordering my the CPUs may cause the following:

Thread 2:   out_of_order_read = data;
Thread 1:   data = 123;
Thread 1:   status = READY;
Thread 2:   if (status == READY) { 
Thread 2:   my_data = out_of_order_read  ; /* XXXX Unexpected VALUE */ 

Basically volatile does not work as expected.

> the examples I can come up with seem to boil down to requiring locks
> or accepting stale values, given that without a synchronization
> mechanism
> one shouldn't expect two processes to act in any specific order.

The problem is that writes from another CPU (or DMA device) can be
observed out of order.

> In my case I can accept reading a stale value so I'm not understanding
> the
> purpose of only having atomic_load/atomic_store wrappers with memory
> barriers.
> 
> I saw a brief discussion where someone proposed barrier free load/store
> but
> don't think I saw any resolution.

Do you mean load/store fences?

A load fence could solve the problem above by preventing the out of
order read of the data by thread 2.

I actually found a race condition close to the one mentioned above in
the kernel yesterday. So we may need to add fences real soon or rewrite
the code to use a spin mutex.

Stephan