Anomalous performance increase from mutex profiling
Kris Kennaway
kris at obsecurity.org
Mon Apr 17 05:16:12 UTC 2006
I have been measuring performance with the supersmack benchmark on a
12-cpu sparc64 E4500. This is in the context of a patch I made to do
exponential backoff during mutex spins, which has a large positive
effect (~17%) in this case.
However, even with CVS sources I found something very odd: when
MUTEX_PROFILING is compiled in but not enabled (which usually still
makes things slower), I see a clear 38% +/- 3% performance *increase*
on this benchmark.
The effects of MUTEX_PROFILING are (as far as I can see):
- Increases size of struct mtx
- Enables MUTEX_NOINLINE
- Enables unconditional atomic_add_int at kern_mutex.c:484
(this may be unintentional; I think it should be conditional on
mutex_prof_enable = 1)
- Adds extra reads/writes of the struct mtx in the mutex
lock/unlock ops
- Adds a chunk of code to _mtx_unlock_flags
But:
- Removing the atomic_add_int from kern_mutex.c:484 seems to
make it about 5% *slower*. I didn't measure this carefully
yet though.
When MUTEX_PROFILING is disabled:
- MUTEX_NOINLINE doesn't help performance noticeably
- Unconditionally changing size of struct mtx doesn't help
- Compiling -O2 instead of -O doesn't help
- Gutting the bulk of the mutex profiling code and leaving only the
extra struct mtx accesses did not help
- Here is the readelf output of the function table in kern_mutex.o in
the non-mutex profiling case:
Num: Value Size Type Bind Vis Ndx Name
9: 0000000000000900 404 FUNC LOCAL DEFAULT 1 db_show_mtx
18: 0000000000000000 68 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags
19: 0000000000000220 420 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep
20: 0000000000000060 64 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags
21: 0000000000000500 508 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep
22: 00000000000000a0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags
24: 00000000000003e0 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin
25: 0000000000000120 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags
27: 0000000000000180 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock
30: 00000000000004e0 8 FUNC GLOBAL DEFAULT 1 do_nothing
42: 0000000000000700 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit
43: 0000000000000740 140 FUNC GLOBAL DEFAULT 1 mtx_init
45: 00000000000007e0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy
48: 0000000000000840 188 FUNC GLOBAL DEFAULT 1 mutex_init
and in the mutex profiling case:
9: 0000000000000ec0 404 FUNC LOCAL DEFAULT 1 db_show_mtx
44: 0000000000000000 40 FUNC LOCAL DEFAULT 1 nanoseconds
46: 0000000000000040 584 FUNC LOCAL DEFAULT 1 dump_mutex_prof_stats
51: 00000000000002a0 192 FUNC LOCAL DEFAULT 1 reset_mutex_prof_stats
68: 0000000000000ce0 36 FUNC GLOBAL DEFAULT 1 mtx_sysinit
69: 0000000000000dc0 96 FUNC GLOBAL DEFAULT 1 mtx_destroy
71: 0000000000000740 92 FUNC GLOBAL DEFAULT 1 _mtx_unlock_spin_flags
75: 00000000000006c0 112 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin_flags
81: 0000000000000360 132 FUNC GLOBAL DEFAULT 1 _mtx_lock_flags
82: 0000000000000840 484 FUNC GLOBAL DEFAULT 1 _mtx_lock_sleep
83: 0000000000000400 684 FUNC GLOBAL DEFAULT 1 _mtx_unlock_flags
86: 0000000000000b60 360 FUNC GLOBAL DEFAULT 1 _mtx_unlock_sleep
88: 0000000000000a40 236 FUNC GLOBAL DEFAULT 1 _mtx_lock_spin
90: 00000000000007a0 132 FUNC GLOBAL DEFAULT 1 _mtx_trylock
93: 0000000000000b40 8 FUNC GLOBAL DEFAULT 1 do_nothing
105: 0000000000000d20 160 FUNC GLOBAL DEFAULT 1 mtx_init
109: 0000000000000e20 160 FUNC GLOBAL DEFAULT 1 mutex_init
Most mutex operations grow a bit with MUTEX_PROFILING (except
_mtx_unlock_sleep which shrinks because of the MUTEX_NOINLINE, see
above), but the biggest growth is from the extra blob of unused (in
this case) code in _mtx_unlock_flags(). This function is 0x100
aligned in the mutex profiling case.
* I tried forcing 0x100 alignment on kern_mutex.o in the !mutex
profiling case, which did not help (also _mtx_unlock_flags() should
not be taking much CPU time; the bulk of it should be in the mutex
spins since there are some very heavily contended mutexes)
* Kernel profiling is apparently not supported on sparc, so I can't
easily work out where it is spending time.
* Our best guess is that mutex profiling is doing something that
reduces contention on this very heavily contended mutex (unp), but I'd
like to know what is happening precisely so I can maybe make use of
it.
Can anyone think of what may be happening that I've missed?
Kris
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-smp/attachments/20060417/b84623fa/attachment-0001.pgp
More information about the freebsd-smp
mailing list