About the memory barrier in BSD libc

Thu Apr 26 22:43:50 UTC 2012

On Wed, Apr 25, 2012 at 10:51:08PM +0000, Ricardo Nabinger Sanchez wrote:
> On Mon, 23 Apr 2012 12:41:20 +0400, Slawa Olhovchenkov wrote:

> > /usr/include/machine/atomic.h:

> > #define mb()    __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > #define wmb()   __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > #define rmb()   __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")

> Somewhat late on this topic, but I'd like to understand why issue a write 
> on %esp, which would invalidate (%esp) on other cores --- thus forcing a 
> miss on them?

The stack is usually private to a thread and is written a lot
regardless.

> Instead, why not issue "mfence" (mb), "sfence" (wmb), and "lfence" (rmb)?

Apart from the fact that those are SSE or SSE2 instructions, the
uncontended locked instructions may be faster. For example, MFENCE is a
serializing instruction on AMD family 10h processors and the AMD
optimization manual recommends using an uncontended locked instruction
instead (though preferably one that performs a useful store).

SFENCE is only useful with non-temporal stores such as MOVNTPS or stores
to write-combining memory because regular stores are not reordered with
respect to one another.

-- 
Jilles Tjoelker