[PATCH] microoptimize locking primitives by introducing randomized delay between atomic ops
Ian Lepore
ian at freebsd.org
Sun Jul 10 14:33:17 UTC 2016
On Sun, 2016-07-10 at 13:13 +0200, Mateusz Guzik wrote:
> If the lock is contended, primitives like __mtx_lock_sleep will spin
> checking if the owner is running or the lock was freed. The problem
> is
> that once it is discovered that the lock is free, multiple CPUs are
> likely to try to do the atomic op which will make it more costly for
> everyone and throughput suffers.
>
> The standard thing to do is to have some sort of a randomized delay
> so
> that this kind of behaviour is reduced.
>
> As such, below is a trivial hack which takes cpu_ticks() into account
> and performs % 2048, which in my testing gives reasonbly good
> results.
>
> Please note there is definitely way more room for improvement in
> general.
>
> In terms of results, there was no statistically significant change in
> -j 40 buildworld nor buildkernel.
>
> However, a 40-way find on a ports tree placed on tmpfs yielded the
> following:
>
> x vanilla
> + patched
> +--------------------------------------------------------------------
> --------------------+
> > ++++ + x
> > x x x |
> > + ++++ +++ + + + ++ + + x x
> > x xxxxxxxx x x x|
> > |_____M____A__________|
> > |________AM______| |
> +--------------------------------------------------------------------
> --------------------+
> N Min Max Median Avg
> Stddev
> x 20 12.431 15.952 14.897 14.7444
> 0.74241657
> + 20 8.103 11.863 9.0135 9.44565
> 1.0059484
> Difference at 95.0% confidence
> -5.29875 +/- 0.565836
> -35.9374% +/- 3.83764%
> (Student's t, pooled s = 0.884057)
>
> The patch:
[...]
What about platforms that don't have a useful implementation of
cpu_ticks()?
What about platforms that don't suffer the large expense for atomic ops
that x86 apparently does?
-- Ian
More information about the freebsd-current
mailing list