msleep() on recursivly locked mutexes
Matthew Dillon
dillon at apollo.backplane.com
Fri Apr 27 17:49:42 UTC 2007
The real culprit here is passing held mutexes to unrelated procedures
in the first place because those procedures might have to block, in
order so those procedures can release and reacquire the mutex.
That's just bad coding in my view. The unrelated procedure has no
clue as to what the mutex is or why it is being held and really has no
business messing with it.
What I did was implement spinlocks with VERY restricted capabilities,
far more restricted then the capabilities of your mutexes. Our
spinlocks are meant only to be used to lock up tiny pieces of code
(like for ref counting or structural or flag-changing operations).
Plus the kernel automatically acts as if it were in a critical section
if it takes an interrupt while the current thread is holding a spinlock.
That way mainline code can just use a spinlock to deal with small bits
of interlocked information without it costing much in the way of
overhead.
I made the decision that ANYTHING more complex then that would have to
use a real lock, like a lockmgr lock or a token, depending on the
characteristics desired. To make it even more desireable I also
stripped down the lockmgr() lock implementation, removing numerous
bits that were inherited from very old code methodologies that have no
business being in a modern operating system, like LK_DRAIN. And I
removed the passing of an interlocking spinlock to the lockmgr code,
because that methodology was being massively abused in existing code
(and I do mean massively).
I'm not quite sure what the best way to go is for FreeBSD, because
you guys have made your mutexes just as or even more sophisticated
then your normal locks in many respects, and you have like 50 different
types of locks now (I can't keep track of them all).
If I were to offer advise it would be: Just stop trying to mix water
and hot wax. Stop holding mutexes across potentially blocking procedure
calls. Stop passing mutexes into unrelated bits of code in order for
them to be released and reacquired somewhere deep in that code. Just
doing that will probably solve all of the problems being reported.
-Matt
More information about the freebsd-hackers
mailing list