About the memory barrier in BSD libc

Konstantin Belousov kostikbel at gmail.com
Tue Apr 24 14:03:58 UTC 2012


On Tue, Apr 24, 2012 at 02:43:40PM +0100, Martin Simmons wrote:
> >>>>> On Mon, 23 Apr 2012 16:03:43 +0300, Konstantin Belousov said:
> > 
> > On Mon, Apr 23, 2012 at 08:33:05PM +0800, Fengwei yin wrote:
> > > On Mon, Apr 23, 2012 at 8:07 PM, Konstantin Belousov
> > > <kostikbel at gmail.com> wrote:
> > > > On Mon, Apr 23, 2012 at 07:44:34PM +0800, Fengwei yin wrote:
> > > >> On Mon, Apr 23, 2012 at 7:38 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > > >> > On Mon, Apr 23, 2012 at 07:26:54PM +0800, Fengwei yin wrote:
> > > >> >
> > > >> >> On Mon, Apr 23, 2012 at 5:40 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > > >> >> > On Mon, Apr 23, 2012 at 05:32:24PM +0800, Fengwei yin wrote:
> > > >> >> >
> > > >> >> >> On Mon, Apr 23, 2012 at 4:41 PM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
> > > >> >> >> > On Mon, Apr 23, 2012 at 02:56:03PM +0800, Fengwei yin wrote:
> > > >> >> >> >
> > > >> >> >> >> Hi list,
> > > >> >> >> >> If this is not correct question on the list, please let me know and
> > > >> >> >> >> sorry for noise.
> > > >> >> >> >>
> > > >> >> >> >> I have a question regarding the BSD libc for SMP arch. I didn't see
> > > >> >> >> >> memory barrier used in libc.
> > > >> >> >> >> How can we make sure it's safe on SMP arch?
> > > >> >> >> >
> > > >> >> >> > /usr/include/machine/atomic.h:
> > > >> >> >> >
> > > >> >> >> > #define mb() ?? ??__asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > > >> >> >> > #define wmb() ?? __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > > >> >> >> > #define rmb() ?? __asm __volatile("lock; addl $0,(%%esp)" : : : "memory")
> > > >> >> >> >
> > > >> >> >>
> > > >> >> >> Thanks for the information. But it looks no body use it in libc.
> > > >> >> >
> > > >> >> > I think no body in libc need memory barrier: libc don't work with
> > > >> >> > peripheral, for atomic opertions used different macros.
> > > >> >>
> > > >> >> If we check the usage of __sinit(), it is a typical singleton pattern which
> > > >> >> needs memory barrier to make sure no potential SMP issue.
> > > >> >>
> > > >> >> Or did I miss something here?
> > > >> >
> > > >> > What architecture with cache incoherency and FreeBSD support?
> > > >>
> > > >> I suppose it's not related with cache inchoherency (I could be wrong).
> > > >> It's related
> > > >> with reorder of instruction by CPU.
> > > >>
> > > >> Here is the link talking about why need memory barrier for singleton:
> > > >> http://www.oaklib.org/docs/oak/singleton.html
> > > >>
> > > >> x86 has strict memory model and may not suffer this kind of issue. But
> > > >> ARM need to
> > > >> take care of it IMHO.
> > > >
> > > > Please note that __sinit is idempotent, so double-initialization is not
> > > > an issue there. The only possible problematic case would be other thread
> > > > executing exit and not noticing non-NULL value for __cleanup while current
> > > > thread just set it.
> > > >
> > > > I am not sure how much real this race is. Each call to _sinit() is immediately
> > > > followed by a lock acquire, typically FLOCKFILE(), which enforces full barrier
> > > > semantic due to pthread_mutex_lock call. The exit() performs __cxa_finalize()
> > > > call before checking __cleanup value, and __cxa_finalize() itself locks
> > > > atexit_mutex. So the race is tiny and probably possible only for somewhat
> > > > buggy applications which perform exit() while there are stdio operations
> > > > in progress.
> > > >
> > > > Also note that some functions assign to __cleanup unconditionally.
> > > >
> > > > Do you see any real issue due to non-synchronized access to __cleanup ?
> > > 
> > > No. I didn't see real issue. I am just reviewing the code.
> > > 
> > > If you don't think __sinit has issue, let's check another code:
> > >      line 68 in libc/stdio/fclose.c
> > >      line 133 in libc/stdio/findfp.c (function __sfp())
> > > 
> > > Which is trying to free a fp slot by assign 0 to fp->_flags. But if
> > > the instrucation
> > > could be re-ordered, another CPU could see fp->_flags is assigned to 0
> > > before the
> > > cleanup from line 57 to 67.
> > > 
> > > Let's say, if another CPU is in line 133 of __sfp(), it could see
> > > fp->_flags become
> > > 0 before it's aware of the cleanup (Line 57 to line 67 in
> > > libc/stdio/fclose.c) happen.
> > > 
> > > Note: the mutex of FUNLOCKFILE(fp) in line 69 of libc/stdio/fclose.c
> > > just could make sure
> > > line 70 happen after line 68. It can't impact the re-order of line 57
> > > ~ line 68 by CPU.
> > 
> > Yes, FUNLOCKFILE() there would have no effect on the potential CPU reordering
> > of the writes.  But does the order of these writes matter at all ?
> > 
> > Please note that __sfp() reinitializes all fields written by fclose().
> > Only if CPU executing fclose() is allowed to reorder operations so that
> > the external effect of _flags = 0 assignment can be observed before that
> > CPU executes other operations from fclose(), there could be a problem.
> > 
> > This is definitely impossible on Intel, and I indeed do not know about
> > other architectures enough to reject such possibility. The _flags member
> > is short, so atomics cannot be used there. The easier solution, if this
> > is indeed an issue, is to lock thread_lock around _flags = 0 assignment
> > in fclose().
> 
> This can be a problem, even on Intel, because the compiler can reorder the
> stores.  E.g. if I compile the following with gcc -O4 on amd64:
> 
> struct foo { int x, y; };
> 
> int foo(struct foo *p)
> {
>   int x = bar();
>   p->y = baz();
>   p->x = x;
> }
> 
> then I get the following assembly language, which sets p->x before p->y:
> 
> 	movq	%rdi, %rbx
> 	call	bar
> 	movl	%eax, %ebp
> 	xorl	%eax, %eax
> 	call	baz
> 	movl	%ebp, (%rbx)
> 	movl	%eax, 4(%rbx)
> 
> __Martin
Ok, as I already said, I think that the reordering is safe there.

Anyway, the change below should remove all concerns.

diff --git a/lib/libc/stdio/fclose.c b/lib/libc/stdio/fclose.c
index f0629e8..383040e 100644
--- a/lib/libc/stdio/fclose.c
+++ b/lib/libc/stdio/fclose.c
@@ -41,9 +41,12 @@ __FBSDID("$FreeBSD$");
 #include <stdio.h>
 #include <stdlib.h>
 #include "un-namespace.h"
+#include <spinlock.h>
 #include "libc_private.h"
 #include "local.h"
 
+extern spinlock_t __stdio_thread_lock;
+
 int
 fclose(FILE *fp)
 {
@@ -65,7 +68,11 @@ fclose(FILE *fp)
 		FREELB(fp);
 	fp->_file = -1;
 	fp->_r = fp->_w = 0;	/* Mess up if reaccessed. */
+	if (__isthreaded)
+		_SPINLOCK(&__stdio_thread_lock);
 	fp->_flags = 0;		/* Release this FILE for reuse. */
+	if (__isthreaded)
+		_SPINUNLOCK(&__stdio_thread_lock);
 	FUNLOCKFILE(fp);
 	return (r);
 }
diff --git a/lib/libc/stdio/findfp.c b/lib/libc/stdio/findfp.c
index 89c0536..bcd6f62 100644
--- a/lib/libc/stdio/findfp.c
+++ b/lib/libc/stdio/findfp.c
@@ -82,9 +82,9 @@ static struct glue *lastglue = &uglue;
 
 static struct glue *	moreglue(int);
 
-static spinlock_t thread_lock = _SPINLOCK_INITIALIZER;
-#define THREAD_LOCK()	if (__isthreaded) _SPINLOCK(&thread_lock)
-#define THREAD_UNLOCK()	if (__isthreaded) _SPINUNLOCK(&thread_lock)
+spinlock_t __stdio_thread_lock = _SPINLOCK_INITIALIZER;
+#define THREAD_LOCK()	if (__isthreaded) _SPINLOCK(&__stdio_thread_lock)
+#define THREAD_UNLOCK()	if (__isthreaded) _SPINUNLOCK(&__stdio_thread_lock)
 
 #if NOT_YET
 #define	SET_GLUE_PTR(ptr, val)	atomic_set_rel_ptr(&(ptr), (uintptr_t)(val))
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-threads/attachments/20120424/7345c7cc/attachment.pgp


More information about the freebsd-threads mailing list