cvs commit: src/sys/alpha/alpha pmap.c src/sys/amd64/amd64pmap.c src/sys/i386/i386 pmap.c src/sys/vm vm_page.c

Sat Jul 31 04:33:29 PDT 2004

On Thu, 29 Jul 2004, Robert Watson wrote:

> On Thu, 29 Jul 2004, Pawel Jakub Dawidek wrote:
>
> > Hmm, why? Code seems to be very simple, I would even risk: simpler than
> > when lock is obtained at first time (but still one atomic operation...).
> > I'm not saying here that you don't know what you're talking about, I'm
> > just looking for an explanation.
>
> An uncontended mutex acquire is one atomic cmpset and one dereference to
> curthread in an inline/macro.
>
> A recursed mutex operation performs the atomic operation, but finds the
> lock is already owned, so falls back into _mtx_lock_sleep(), a function.
> That function then re-dereferences curthread (optimization opportunity),
> calls mtx_owned(), which re-dereferences curthread again (optimization
> opportunity), increment the recursion counter as a regular integer
> operation, and use an atomic operation to set the MTX_RECURSED flag.

Er, there are no extra atomic operations for recursive mutexes.  E.g.,
there isn't one to set the MTX_RECURSED flag, since that flag is only
set during initialization to indicate that recursion is permitted;
recursion consists mainly of unlocked (but implicitly atomic) increments
and decrements of the recursion counter.

Recursive mutexes aren't really expensive.  We just don't optimize for
them.  They basically just take one extra comparison operation and an
increment or decrement, as in the spin mutex macros.  Normal use of
recursive mutexes produces lots of self-contention on the lock, so
some of the less optimized code paths are taken more often than for
non-recursive mutexes.  Not optimizing for recursive mutexes is correct
since recursive mutexes shouldn't exist.

> I.e., don't recurse except in unusual cases.  It looks like we can
> optimize out two uses of curthread though, which might lighten the cost of
> recursion.  While most of our locking doesn't do recursion, Giant can and
> does recurse and it might help to do that optimization.
>
> I found that eliminating an extra curthread dereference from
> critical_enter()/critical_exit() made a noticeable difference for UP
> kernel-intensive benchmarks, FWIW.

My version has spin mutexes uninlined with critical_*() inlined instead.
This reduces curthread references as a side effect.  The corresponding
uninlining for sleep mutexes doesn't work as well since it gives extra
function calls.

curthread should be declared __pure or something so that gcc can
automatically avoid reloading it.  I think it is sufficiently invariant
for this in most or all places.  It is more invariant than most global
variables.

Bruce