svn commit: r242014 - head/sys/kern

Wed Oct 24 20:02:44 UTC 2012

On Wed, Oct 24, 2012 at 8:30 PM, Alexander Motin <mav at freebsd.org> wrote:
> On 24.10.2012 22:16, Andre Oppermann wrote:
>>
>> On 24.10.2012 20:56, Jim Harris wrote:
>>>
>>> On Wed, Oct 24, 2012 at 11:41 AM, Adrian Chadd <adrian at freebsd.org>
>>> wrote:
>>>>
>>>> On 24 October 2012 11:36, Jim Harris <jimharris at freebsd.org> wrote:
>>>>
>>>>>    Pad tdq_lock to avoid false sharing with tdq_load and tdq_cpu_idle.
>>>>
>>>>
>>>> Ok, but..
>>>>
>>>>
>>>>>          struct mtx      tdq_lock;               /* run queue lock. */
>>>>> +       char            pad[64 - sizeof(struct mtx)];
>>>>
>>>>
>>>> .. don't we have an existing compile time macro for the cache line
>>>> size, which can be used here?
>>>
>>>
>>> Yes, but I didn't use it for a couple of reasons:
>>>
>>> 1) struct tdq itself is currently using __aligned(64), so I wanted to
>>> keep it consistent.
>>> 2) CACHE_LINE_SIZE is currently defined as 128 on x86, due to
>>> NetBurst-based processors having 128-byte cache sectors a while back.
>>> I had planned to start a separate thread on arch@ about this today on
>>> whether this was still appropriate.
>>
>>
>> See also the discussion on svn-src-all regarding global struct mtx
>> alignment.
>>
>> Thank you for proving my point. ;)
>>
>> Let's go back and see how we can do this the sanest way.  These are
>> the options I see at the moment:
>>
>>   1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
>>   2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
>>      the future possibly change to a different compiler dependent
>>      align attribute
>>   3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
>>      automatically gets aligned in all cases, even when dynamically
>>      allocated.
>>
>> Personally I'm undecided between #2 and #3.  #1 is ugly.  In favor
>> of #3 is that there possibly isn't any case where you'd actually
>> want the mutex to share a cache line with anything else, even a data
>> structure.
>
>
> I'm sorry, could you hint me with some theory? I think I can agree that
> cache line sharing can be a problem in case of spin locks -- waiting thread
> will constantly try to access page modified by other CPU, that I guess will
> cause cache line writes to the RAM. But why is it so bad to share lock with
> respective data in case of non-spin locks? Won't benefits from free regular
> prefetch of the right data while grabbing lock compensate penalties from
> relatively rare collisions?

Yes, but be aware that adaptive spinning imposes the same type of
cache sharing issues than spinlocks have.

I just found this 4 years old patch that implements back-off
algorithms for spinmutexes and adaptive spinning. I think it would be
interesting if someone can find time to benchmark and tune it. I can
clean it up and make a commit candidate if there is an interest:
http://lists.freebsd.org/pipermail/freebsd-smp/2008-June/001561.html

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein