svn commit: r230201 - head/lib/libc/gen

Thu Jan 19 05:57:56 UTC 2012

On 2012/1/19 11:24, David Xu wrote:
> On 2012/1/18 23:09, John Baldwin wrote:
>> On Tuesday, January 17, 2012 9:09:25 pm David Xu wrote:
>>> On 2012/1/17 22:57, John Baldwin wrote:
>>>> On Monday, January 16, 2012 1:15:14 am David Xu wrote:
>>>>> Author: davidxu
>>>>> Date: Mon Jan 16 06:15:14 2012
>>>>> New Revision: 230201
>>>>> URL: http://svn.freebsd.org/changeset/base/230201
>>>>>
>>>>> Log:
>>>>>     Insert read memory barriers.
>>>> I think using atomic_load_acq() on sem->nwaiters would be clearer 
>>>> as it would
>>>> indicate which variable you need to ensure is read after other 
>>>> operations.  In
>>>> general I think raw rmb/wmb usage should be avoided when possible 
>>>> as it is
>>>> does not describe the programmer's intent as well.
>>>>
>>> Yes, I had considered that I may use atomic_load_acq(), but at that 
>>> time,
>>> I thought it emits a bus locking, right ? so I just picked up rmb() 
>>> which
>>> only affects current cpu. maybe atomic_load_acq() does same thing with
>>> rmb() ?
>>> it is still unclear to me.
>> atomic_load_acq() is the same as rmb().  Right now it uses a locked
>> instruction on amd64, but it could easily switch to lfence/sfence 
>> instead.  I
>> had patches to do that but I think bde@ had done some benchmarks that 
>> showed
>> that change made no difference.
>>
> I wish there is a version uses lfence for atomic_load_acq(). I always 
> think
> bus locking is expensive on a multiple-core machine. Here we work on 
> large
> machine found that even current rwlock in libthr is not scale well if
> most threads are readers, we have to implement CSNZI-like rwlock to avoid
> CPU conflict.
> http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf
>
> I have just done a benchmark on my notebook which is a 4 SMT sandy bridge
> CPU i3 2310m.
> http://people.freebsd.org/~davidxu/bench/semaphore/ 
> <http://people.freebsd.org/%7Edavidxu/bench/semaphore/>
>
> The load_acq uses atomic locking is much slower than lfence:
> http://people.freebsd.org/~davidxu/bench/semaphore/ministat.txt 
> <http://people.freebsd.org/%7Edavidxu/bench/semaphore/ministat.txt>
>
> benchmark program:
> http://people.freebsd.org/~davidxu/bench/semaphore/sem_test.c 
> <http://people.freebsd.org/%7Edavidxu/bench/semaphore/sem_test.c>
>
rdtsc() may not work on SMP, so I have updated it to use clock_gettime 
to get total time.
http://people.freebsd.org/~davidxu/bench/semaphore2/ 
<http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>

Still, lfence is a lot faster than atomic lock.