[PATCH] randomized delay in locking primitives, take 2

Sun Jul 31 20:36:18 UTC 2016

On Sun, Jul 31, 2016 at 07:03:08AM -0700, Adrian Chadd wrote:
> Hi,
> 
> Did you test on any 1, 2, 4, 8 cpu machines? just to see if there are
> any performance degredations on lower count CPUs?
> 

I did not test on machines which physically that few cpus, but I did
test the impact on microbenchmark with 2 and 4 threads on the 80-way
machine. There was no difference.

For this iteration of the patch, given limited time I tried to be very
conservative as to not intoduce additional latency. In fact I would
argue the patch is undertuned (as in, it can do better in certain
workloads).

That said, I think it is safe to use.

> Also, yeah, the MOD operator in each loop could get spendy on older
> CPUs (eg my MIPS CPUs, older ARM stuff, etc.) Is it possible to
> achieve much the same autotuning with pow2 operations instead of
> divide/mod?
> 

The % operation acts a randomizer. It is optional and I'm happy to ifdef
it based on the architecture. It does seem to be useful at least on
amd64.

As a side note, exponential backoff is not used to keep things smaller
(see above). It is definitely subject to change later.

-- 
Mateusz Guzik <mjguzik gmail.com>