svn commit: r230201 - head/lib/libc/gen

Fri Jan 20 11:42:03 UTC 2012

On 20 Jan 2012, at 00:46, David Xu wrote:

> It depends on hardware, if it is a large machine with lots of cpu,
> a small conflict on dual-core machine can become a large conflict
> on large machine because it is possible more cpus are now
> running same code which becomes a bottleneck. On a large machine
> which has 1024 cores, many code need to be redesigned.

You'll also find that the relative cost of atomic instructions varies a lot between CPU models.  Between Core 2 and Sandy Bridge Core i7, the relative cost of an atomic add (full barrier) dropped by about two thirds.  The cache coherency logic has been significantly improved on the newer chips.  

For portable code, it's worth remembering that ARMv8 (which doesn't entirely exist yet) contains a set of barriers that closely match the semantics of the C[++]11 memory ordering.  They do this not for performance (directly), but for power efficiency - so using the least-restrictive required locking will eventually result in code for mobile devices that uses less battery power, if it's in a hot path.  

David