svn commit: r230201 - head/lib/libc/gen

Fri Jan 20 14:52:50 UTC 2012

On Thursday, January 19, 2012 7:36:33 pm David Xu wrote:
> On 2012/1/19 23:23, John Baldwin wrote:
> > On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
> >> rdtsc() may not work on SMP, so I have updated it to use clock_gettime
> >> to get total time.
> >> http://people.freebsd.org/~davidxu/bench/semaphore2/
> >> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>
> >>
> >> Still, lfence is a lot faster than atomic lock.
> > http://www.freebsd.org/~jhb/patches/amd64_fence.patch
> >
> > This the patch I've had for quite a while.  Can you retest with this?  
You'll
> > probably have to install the updated header in /usr/include as well.
> >
> The lines in atomic_load_acq() seem not what I want:
> 
> +	v = *p;						\
> +	__asm __volatile("lfence" ::: "memory");	\
> 
> I think they should be swapped ?

No, the point is that any subsequent loads cannot pass the '*p'.  If you swap 
the order, then the compiler (and CPU) are free to reorder '*p' to be later 
than some other load later in program order.

> +	__asm __volatile("lfence" ::: "memory");	\
> +	v = *p;						\
> 
> What I need in the semaphore code is read can not pass write in such a 
special case.

Hmm, it seems you need the equivalent of an 'mfence'.  Note that for your
first change in your diff, it should not have made a difference on x86.  
atomic_add_rel_int() already has the equivalent of an 'mfence' (on x86 the 
non-load/store ops all end up with full fences since that is what 'lock' 
provides, the architecture doesn't let us do more fine-grained barriers).

It may be that you still have a race and that the barrier just changed the 
timing enough to fix your test case.  Specifically, note that an 'rmb'
(or 'lfence') does not force other CPUs to flush any pending writes, or
wait for other CPUs to flush pending writes.  Even with the lfence, you can
still read a "stale" value of _has_waiters.  This is why in-kernel locking
primitives encode this state in the lock cookie via contested flags and
use cmpset's to set them (and retry the loop if the cmpset fails).

-- 
John Baldwin