About the memory barrier in BSD libc

Martin Simmons martin at lispworks.com
Tue Apr 24 13:54:51 UTC 2012


>>>>> On Mon, 23 Apr 2012 16:03:43 +0300, Konstantin Belousov said:
> 
> On Mon, Apr 23, 2012 at 08:33:05PM +0800, Fengwei yin wrote:
> > On Mon, Apr 23, 2012 at 8:07 PM, Konstantin Belousov
> > <kostikbel at gmail.com> wrote:
> > > On Mon, Apr 23, 2012 at 07:44:34PM +0800, Fengwei yin wrote:
> > >> On Mon, Apr 23, 2012 at 7:38 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > >> > On Mon, Apr 23, 2012 at 07:26:54PM +0800, Fengwei yin wrote:
> > >> >
> > >> >> On Mon, Apr 23, 2012 at 5:40 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > >> >> > On Mon, Apr 23, 2012 at 05:32:24PM +0800, Fengwei yin wrote:
> > >> >> >
> > >> >> >> On Mon, Apr 23, 2012 at 4:41 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > >> >> >> > On Mon, Apr 23, 2012 at 02:56:03PM +0800, Fengwei yin wrote:
> > >> >> >> >
> > >> >> >> >> Hi list,
> > >> >> >> >> If this is not correct question on the list, please let me know and
> > >> >> >> >> sorry for noise.
> > >> >> >> >>
> > >> >> >> >> I have a question regarding the BSD libc for SMP arch. I didn't see
> > >> >> >> >> memory barrier used in libc.
> > >> >> >> >> How can we make sure it's safe on SMP arch?
> > >> >> >> >
> > >> >> >> > /usr/include/machine/atomic.h:
> > >> >> >> >
> > >> >> >> > #define mb() š š__asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > >> >> >> > #define wmb() š __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > >> >> >> > #define rmb() š __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > >> >> >> >
> > >> >> >>
> > >> >> >> Thanks for the information. But it looks no body use it in libc.
> > >> >> >
> > >> >> > I think no body in libc need memory barrier: libc don't work with
> > >> >> > peripheral, for atomic opertions used different macros.
> > >> >>
> > >> >> If we check the usage of __sinit(), it is a typical singleton pattern which
> > >> >> needs memory barrier to make sure no potential SMP issue.
> > >> >>
> > >> >> Or did I miss something here?
> > >> >
> > >> > What architecture with cache incoherency and FreeBSD support?
> > >>
> > >> I suppose it's not related with cache inchoherency (I could be wrong).
> > >> It's related
> > >> with reorder of instruction by CPU.
> > >>
> > >> Here is the link talking about why need memory barrier for singleton:
> > >> http://www.oaklib.org/docs/oak/singleton.html
> > >>
> > >> x86 has strict memory model and may not suffer this kind of issue. But
> > >> ARM need to
> > >> take care of it IMHO.
> > >
> > > Please note that __sinit is idempotent, so double-initialization is not
> > > an issue there. The only possible problematic case would be other thread
> > > executing exit and not noticing non-NULL value for __cleanup while current
> > > thread just set it.
> > >
> > > I am not sure how much real this race is. Each call to _sinit() is immediately
> > > followed by a lock acquire, typically FLOCKFILE(), which enforces full barrier
> > > semantic due to pthread_mutex_lock call. The exit() performs __cxa_finalize()
> > > call before checking __cleanup value, and __cxa_finalize() itself locks
> > > atexit_mutex. So the race is tiny and probably possible only for somewhat
> > > buggy applications which perform exit() while there are stdio operations
> > > in progress.
> > >
> > > Also note that some functions assign to __cleanup unconditionally.
> > >
> > > Do you see any real issue due to non-synchronized access to __cleanup ?
> > 
> > No. I didn't see real issue. I am just reviewing the code.
> > 
> > If you don't think __sinit has issue, let's check another code:
> >      line 68 in libc/stdio/fclose.c
> >      line 133 in libc/stdio/findfp.c (function __sfp())
> > 
> > Which is trying to free a fp slot by assign 0 to fp->_flags. But if
> > the instrucation
> > could be re-ordered, another CPU could see fp->_flags is assigned to 0
> > before the
> > cleanup from line 57 to 67.
> > 
> > Let's say, if another CPU is in line 133 of __sfp(), it could see
> > fp->_flags become
> > 0 before it's aware of the cleanup (Line 57 to line 67 in
> > libc/stdio/fclose.c) happen.
> > 
> > Note: the mutex of FUNLOCKFILE(fp) in line 69 of libc/stdio/fclose.c
> > just could make sure
> > line 70 happen after line 68. It can't impact the re-order of line 57
> > ~ line 68 by CPU.
> 
> Yes, FUNLOCKFILE() there would have no effect on the potential CPU reordering
> of the writes.  But does the order of these writes matter at all ?
> 
> Please note that __sfp() reinitializes all fields written by fclose().
> Only if CPU executing fclose() is allowed to reorder operations so that
> the external effect of _flags = 0 assignment can be observed before that
> CPU executes other operations from fclose(), there could be a problem.
> 
> This is definitely impossible on Intel, and I indeed do not know about
> other architectures enough to reject such possibility. The _flags member
> is short, so atomics cannot be used there. The easier solution, if this
> is indeed an issue, is to lock thread_lock around _flags = 0 assignment
> in fclose().

This can be a problem, even on Intel, because the compiler can reorder the
stores.  E.g. if I compile the following with gcc -O4 on amd64:

struct foo { int x, y; };

int foo(struct foo *p)
{
  int x = bar();
  p->y = baz();
  p->x = x;
}

then I get the following assembly language, which sets p->x before p->y:

	movq	%rdi, %rbx
	call	bar
	movl	%eax, %ebp
	xorl	%eax, %eax
	call	baz
	movl	%ebp, (%rbx)
	movl	%eax, 4(%rbx)

__Martin


More information about the freebsd-threads mailing list