svn commit: r230201 - head/lib/libc/gen

Thu Jan 19 16:55:48 UTC 2012

On Thu, 19 Jan 2012, John Baldwin wrote:

> On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
>> rdtsc() may not work on SMP, so I have updated it to use clock_gettime
>> to get total time.
>> http://people.freebsd.org/~davidxu/bench/semaphore2/
>> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>
>>
>> Still, lfence is a lot faster than atomic lock.

I hope it does non-microbenchmarks.  IIRC, jhb found that it was
actually slower in some cases.  I only did micro-benchmarks on Athlon64.

> http://www.freebsd.org/~jhb/patches/amd64_fence.patch
>
> This the patch I've had for quite a while.  Can you retest with this?  You'll
> probably have to install the updated header in /usr/include as well.

% --- //depot/projects/smpng/sys/amd64/include/atomic.h	2011-01-05 17:06:25.000000000 0000
% +++ //depot/user/jhb/ktrace/amd64/include/atomic.h	2011-01-05 22:08:56.000000000 0000
% ...
% @@ -213,13 +213,12 @@
%  #if defined(_KERNEL) && !defined(SMP)
% 
%  /*
% - * We assume that a = b will do atomic loads and stores.  However, on a
% - * PentiumPro or higher, reads may pass writes, so for that case we have
% - * to use a serializing instruction (i.e. with LOCK) to do the load in
% - * SMP kernels.  For UP kernels, however, the cache of the single processor
% + * We assume that a = b will do atomic loads and stores.  However, reads
% + * may pass writes, so we have to use fences in SMP kernels to preserve
% + * ordering.  For UP kernels, however, the cache of the single processor
%   * is always consistent, so we only need to take care of compiler.
%   */
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\

It also has some simplifications from removing the use of different
operators for load and store.  These simplifications seem to be not
quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster
than "xchg*" in Athlon64.  The ATOMIC_STORE_LOAD() macro is obfuscatory.
Better to have separate macros for load/store, like we do for
set/clear/add/subtract.  (The latter can be obfuscated even better
using 4 parameters for the ops.  Then better yet by making the type
a parameter.)

% @@ -240,32 +239,22 @@
% 
%  #else /* !(_KERNEL && !SMP) */
% 
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\
% -	u_##TYPE res;					\
% +	u_##TYPE v;					\
%  							\
% -	__asm __volatile(MPLOCKED LOP			\
% -	: "=a" (res),			/* 0 */		\
% -	  "=m" (*p)			/* 1 */		\
% -	: "m" (*p)			/* 2 */		\
% -	: "memory", "cc");				\
% -							\
% -	return (res);					\
% +	v = *p;						\
% +	__asm __volatile("lfence" ::: "memory");	\

Style bug (missing spaces around binary operator `:') which becomes
a syntax error for C++.  Other places in this file use ` : : : '.

lfence() should be in cpufunc.h if it can be done separately
like this.  However, I think it can't be done separately --
it needs to be done in the same asm as the load/store, since
separate C statements may be reordered.  This is the same
problem that forces us to write __asm volatile("sti; hlt");
instead of sti(); hlt(); in too many places for idling in
machdep.c.  BTW, sti() and hlt() are bogusly not in cpufunc.h
either:
- I misnamed sti() as disable_intr() since disable_intr() was
   supposed to be MI and I didn't understand that any MI
   interface should not be direct in cpufunc.h.
- someone misnamed hlt() as halt().

Bruce