svn commit: r242014 - head/sys/kern

Wed Oct 24 20:43:08 UTC 2012

On 24.10.2012 21:30, Alexander Motin wrote:
> On 24.10.2012 22:16, Andre Oppermann wrote:
>> On 24.10.2012 20:56, Jim Harris wrote:
>>> On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd <adrian at freebsd.org>
>>> wrote:
>>>> On 24 October 2012 11:36, Jim Harris <jimharris at freebsd.org> wrote:
>>>>
>>>>>    Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.
>>>>
>>>> Ok, but..
>>>>
>>>>
>>>>>          struct mtx      tdq_lock;               /* run queue lock. */
>>>>> +       char            pad[64 - sizeof(struct mtx)];
>>>>
>>>> .. don't we have an existing compile time macro for the cache line
>>>> size, which can be used here?
>>>
>>> Yes, but I didn't use it for a couple of reasons:
>>>
>>> 1) struct tdq itself is currently using __aligned(64), so I wanted to
>>> keep it consistent.
>>> 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
>>> NetBurst-based processors having 128-byte cache sectors a while back.
>>> I had planned to start a separate thread on arch@ about this today on
>>> whether this was still appropriate.
>>
>> See also the discussion on svn-src-all regarding global struct mtx
>> alignment.
>>
>> Thank you for proving my point. ;)
>>
>> Let's go back and see how we can do this the sanest way.  These are
>> the options I see at the moment:
>>
>>   1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
>>   2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
>>      the future possibly change to a different compiler dependent
>>      align attribute
>>   3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
>>      automatically gets aligned in all cases, even when dynamically
>>      allocated.
>>
>> Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
>> of #3 is that there possibly isn't any case where you'd actually
>> want the mutex to share a cache line with anything else, even a data
>> structure.
>
> I'm sorry, could you hint me with some theory? I think I can agree that cache line sharing can be a
> problem in case of spin locks -- waiting thread will constantly try to access page modified by other
> CPU, that I guess will cause cache line writes to the RAM. But why is it so bad to share lock with
> respective data in case of non-spin locks? Won't benefits from free regular prefetch of the right
> data while grabbing lock compensate penalties from relatively rare collisions?

Cliff Click describes it in detail:
  http://www.azulsystems.com/blog/cliff/2009-04-14-odds-ends

For a classic mutex it likely doesn't make much difference since the
cache line is exclusive anyway while the lock is held.  On LL/SC systems
there may be cache line dirtying on a failed locking attempt.

For spin mutexes it hurts badly as you noted.

Especially on RW mutexes it hurts because a read lock dirties the cache
line for all other CPU's.  Here the RW mutex should be on its own cache
line in all cases.

-- 
Andre