svn commit: r230201 - head/lib/libc/gen
Bruce Evans
brde at optusnet.com.au
Thu Jan 19 16:55:48 UTC 2012
On Thu, 19 Jan 2012, John Baldwin wrote:
> On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
>> rdtsc() may not work on SMP, so I have updated it to use clock_gettime
>> to get total time.
>> http://people.freebsd.org/~davidxu/bench/semaphore2/
>> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>
>>
>> Still, lfence is a lot faster than atomic lock.
I hope it does non-microbenchmarks. IIRC, jhb found that it was
actually slower in some cases. I only did micro-benchmarks on Athlon64.
> http://www.freebsd.org/~jhb/patches/amd64_fence.patch
>
> This the patch I've had for quite a while. Can you retest with this? You'll
> probably have to install the updated header in /usr/include as well.
% --- //depot/projects/smpng/sys/amd64/include/atomic.h 2011-01-05 17:06:25.000000000 0000
% +++ //depot/user/jhb/ktrace/amd64/include/atomic.h 2011-01-05 22:08:56.000000000 0000
% ...
% @@ -213,13 +213,12 @@
% #if defined(_KERNEL) && !defined(SMP)
%
% /*
% - * We assume that a = b will do atomic loads and stores. However, on a
% - * PentiumPro or higher, reads may pass writes, so for that case we have
% - * to use a serializing instruction (i.e. with LOCK) to do the load in
% - * SMP kernels. For UP kernels, however, the cache of the single processor
% + * We assume that a = b will do atomic loads and stores. However, reads
% + * may pass writes, so we have to use fences in SMP kernels to preserve
% + * ordering. For UP kernels, however, the cache of the single processor
% * is always consistent, so we only need to take care of compiler.
% */
% -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP) \
% +#define ATOMIC_STORE_LOAD(TYPE) \
% static __inline u_##TYPE \
% atomic_load_acq_##TYPE(volatile u_##TYPE *p) \
% { \
It also has some simplifications from removing the use of different
operators for load and store. These simplifications seem to be not
quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster
than "xchg*" in Athlon64. The ATOMIC_STORE_LOAD() macro is obfuscatory.
Better to have separate macros for load/store, like we do for
set/clear/add/subtract. (The latter can be obfuscated even better
using 4 parameters for the ops. Then better yet by making the type
a parameter.)
% @@ -240,32 +239,22 @@
%
% #else /* !(_KERNEL && !SMP) */
%
% -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP) \
% +#define ATOMIC_STORE_LOAD(TYPE) \
% static __inline u_##TYPE \
% atomic_load_acq_##TYPE(volatile u_##TYPE *p) \
% { \
% - u_##TYPE res; \
% + u_##TYPE v; \
% \
% - __asm __volatile(MPLOCKED LOP \
% - : "=a" (res), /* 0 */ \
% - "=m" (*p) /* 1 */ \
% - : "m" (*p) /* 2 */ \
% - : "memory", "cc"); \
% - \
% - return (res); \
% + v = *p; \
% + __asm __volatile("lfence" ::: "memory"); \
Style bug (missing spaces around binary operator `:') which becomes
a syntax error for C++. Other places in this file use ` : : : '.
lfence() should be in cpufunc.h if it can be done separately
like this. However, I think it can't be done separately --
it needs to be done in the same asm as the load/store, since
separate C statements may be reordered. This is the same
problem that forces us to write __asm volatile("sti; hlt");
instead of sti(); hlt(); in too many places for idling in
machdep.c. BTW, sti() and hlt() are bogusly not in cpufunc.h
either:
- I misnamed sti() as disable_intr() since disable_intr() was
supposed to be MI and I didn't understand that any MI
interface should not be direct in cpufunc.h.
- someone misnamed hlt() as halt().
Bruce
More information about the svn-src-all
mailing list