svn commit: r241889 - in user/andre/tcp_workqueue/sys: arm/arm cddl/compat/opensolaris/kern cddl/contrib/opensolaris/uts/common/dtrace cddl/contrib/opensolaris/uts/common/fs/zfs ddb dev/acpica dev/...

Thu Oct 25 21:03:30 UTC 2012

On 25.10.2012 18:21, Attilio Rao wrote:
> On Thu, Oct 25, 2012 at 5:15 PM, Andre Oppermann <andre at freebsd.org> wrote:
>> On 25.10.2012 05:14, Attilio Rao wrote:
>>>
>>> On Wed, Oct 24, 2012 at 9:13 PM, Attilio Rao <attilio at freebsd.org> wrote:
>>> I think I've had a better idea for this.
>>> In our locking scheme we already rely on the fact that lock_object
>>> will always be present in all our locking primitives and that it will
>>> be the first object of every locking primitives. This is an assumption
>>> we must live with in order to correctly implement lock classes. I
>>> think we can use the same concept in order to use the same KPI for the
>>> 2 different structures (struct mtx and struct mtx_unshare) and keep
>>> the compile time ability to find stupid bugs.
>>
>>
>> I think we're completely overdoing it.  On amd64 the size difference
>> of 64B cache line aligning and padding all mutex, sx and rw_lock
>> structures adds the tiny amount of 16K on a GENERIC kernel of 19MB.
>> That is a 0.009% increase in size.  Of course dynamically allocated
>> memory that includes a mutex grows a tiny bit at well.
>>
>> Hence I propose to unconditionally slap __aligned(CACHE_LINE_SIZE) into
>> all locking structures and be done with it.  As an added benefit we
>> don't have to worry about individual micro-optimizations on a case by
>> case basis.
>
> Did you see my (and also Jeff) objection to your proposal about this?
> You are deliberating ignoring this?

Well, I'm allowed to have a different opinion, am I?  I'm not ignoring
your objection in the sense as I'm not trying to commit any of this to
HEAD while it is disputed.

Mind you this whole conversation was started because I was trying to solve
a problem with unfortunate cache line sharing for global mutexes in the
kernel .bss section on my *personal* svn branch.

 From the discussion, which I've actively solicited, we've figured out
quite a few things that work, don't work or are not really what we want
to have.  I've learned a lot on how alignment and padding is handled
within gcc+ld and probably will change my opinion once (again) when
I have fully digested it.

I very much appreciate your and everybody else's input and effort in
trying to find a useful and practical solution!

> I think that mutexes being part of structures usually don't want that
> and it is really overkill. It is not the amount of wasted memory the
> problem (which is also important, anyway) but the fact that
> not-really-contentended sleep mutexes, will interfree with normal
> structure members caching.
>
> I think it is a very bad idea.

OTOH there are clear situations where a mutex and rw_lock should end
up on their own cache line as proven by Jim Harris.  Also from visual
inspection [1] and knowledge of the code path I can say that quite a
few global mutexes (even non-spin ones) have false sharing and do cause
a lot of cache line bouncing.  [1] nm -n and readelf.

Lets put this discussion rest for a few days to let it filter through
the great minds and then try again to align the different requirements.

-- 
Andre