svn commit: r241889 - in user/andre/tcp_workqueue/sys: arm/arm cddl/compat/opensolaris/kern cddl/contrib/opensolaris/uts/common/dtrace cddl/contrib/opensolaris/uts/common/fs/zfs ddb dev/acpica dev/...

Thu Oct 25 19:05:43 UTC 2012

On Thu, Oct 25, 2012 at 9:15 AM, Andre Oppermann <andre at freebsd.org> wrote:
> I think we're completely overdoing it.

I agree, but in the opposite direction.  This is a solution looking
for a problem.

> On amd64 the size difference
> of 64B cache line aligning and padding all mutex, sx and rw_lock
> structures adds the tiny amount of 16K on a GENERIC kernel of 19MB.
> That is a 0.009% increase in size.  Of course dynamically allocated
> memory that includes a mutex grows a tiny bit at well.
>
> Hence I propose to unconditionally slap __aligned(CACHE_LINE_SIZE) into
> all locking structures and be done with it.  As an added benefit we
> don't have to worry about individual micro-optimizations on a case by
> case basis.

What problem are you trying to solve?  I understand all about cache
sharing, but if you force struct mtx to take its own cache line, I now
have no ability to put data accessed under the lock in the same cache
line.  You've de-optimized code and memory layout.  And like alc@
said, ignored the mtx embedded in many dynamically allocated
structures.

If certain, specific global mutexes will benefit, then they can be
explicitly allocated as __aligned and explicitly padded to a cache
line.  No other mtx except ones specifically identified as making a
performance difference should be touched.  There is no need for a
general solution.

Thanks,
matthew