b_freelist TAILQ/SLIST

Sat Jun 29 14:49:08 UTC 2013

On Sat, Jun 29, 2013 at 10:06:11AM +0300, Alexander Motin wrote:
> I understand that lock attempt will steal cache line from lock owner. 
> What I don't very understand is why avoiding it helps performance in 
> this case. Indeed, having mutex on own cache line will not let other 
> cores to steal also bswlist, but it also means that bswlist should be 
> prefetched separately (and profiling shows resource stalls there). Or in 
> this case separate speculative prefetch will be better then forced one 
> which could be stolen? Is there cases when it is not, or the only reason 
> to not pad all global mutexes is only saving memory?

I can speculate that it is the case when speculative execution helps.
If mutex and list head are on the different cache lines, then cpu
could speculatively read the head, and then prove that executing the
read before the lock acquisition does not break the ordering rules
(because lock protects the head, other core indeed cannot modify
the head if the lock acquisition was successfull).

I think it is very similar reason why locked instructions as barriers
are faster then the explicit barriers, cpu could still do the speculative
execution after the lock prefix if the ordering is provable consistent.

Please see the Intel IA32 architecture optimization manual 8.4.5 for
the recommendations (but not much explanation).

Yes, I think putting all locks on dedicated cache lines is the waste,
only hot locks need this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20130629/f7638caa/attachment.sig>