[Bug 198014] Signals can lead to an inconsistency in PI mutex ownership

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Tue Feb 24 21:20:00 UTC 2015


            Bug ID: 198014
           Summary: Signals can lead to an inconsistency in PI mutex
           Product: Base System
           Version: 11.0-CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: eric at vangyzen.net

Signals can lead to an inconsistency in PI mutex ownership.

I have two test cases to reproduce this.  I hope to provide them soon.  For
now, here is the description.

Consider three threads--Trun, Tsleep, and Tsig--all contending for one pthread
mutex.  Trun owns the mutex and is running in userspace.

Tsleep wants the mutex and calls pthread_mutex_lock...do_lock_pi.  Near the top
of do_lock_pi, Tsleep allocates a umtx_pi object.  This object will exist as
long as at least one thread is in do_lock_pi for this mutex.  Since Trun owns
the mutex, Tsleep sets UMUTEX_CONTESTED and calls umtxq_sleep_pi.  Therein,
Tsleep adds itself to the queue of waiters (umtxq_insert) and assigns ownership
of the umtx_pi to Trun (umtx_pi_setowner).  It then sleeps in utmxq_sleep.

Tsig wants the mutex and does the same as Tsleep, with a few differences.  Tsig
does not allocate a new umtx_pi; instead, it finds the existing umtx_pi and
increments its reference count.  Tsig becomes the second thread in the queue of
waiters.  Tsig does not set ownership of the umtx_pi, since that's already
done.  Tsig then sleeps in umtxq_sleep.

Trun calls pthread_mutex_unlock...do_unlock_pi.  Therein, umtxq_count_pi
indicates that Tsleep is the first thread on the queue of waiters.  Trun
disowns the umtx_pi, removes Tsleep from the queue of waiters, and makes it
runnable.  However, Tsleep does not run immediately, for whatever reason. 
Perhaps all CPUs are busy.  Perhaps CPU sets, priorities, and schedling policy
allow Trun to keep running while Tsleep sits on the run queue.

Trun calls pthread_mutex_lock...do_lock_pi again.  It acquires the mutex,
claims ownership of the umtx_pi (umtx_pi_claim), and returns to userland.

A thread sends a signal to Tsig.  It returns from umtxq_sleep, removes itself
from the queue of waiters, and ultimately returns from do_lock_pi.  The queue
of waiters is now empty.

Trun calls pthread_mutex_unlock...do_unlock_pi.  Unlike last time,
umtxq_count_pi says the queue is empty, so Trun does not disown the umtx_pi. 
(Recall that the umtx_pi remains in existence due to the reference by Tsleep.) 
Trun sets the mutex to UMUTEX_UNOWNED and returns.

Now, the mutex and umtx_pi disagree on the ownership of the mutex.  From here,
there are several possible paths to failure.  For completeness, let's follow
through with one.

Any thread--Tany--locks the mutex.  Any other thread--Tother--tries to lock it,
sets the contested bit, adds itself to the queue, and sleeps.  Tany unlocks the
mutex; since it's contested, Tany calls do_unlock_pi.  Since Tother is in the
queue, uq_first is non-NULL.  Recall that Trun still owns the umtx_pi, so
pi->pi_owner != curthread, so do_unlock_pi returns EPERM and leaves the umutex
owned by Tany.  Before calling do_unlock_pi, Tany had already disowned the
pthread_mutex.  The error from _thr_umutex_unlock2 has no effect.

So, nobody owns the pthread_mutex, Tany owns the umutex, and Trun owns the
umtx_pi.  Prior to r277970, this broken ownership could have caused a panic. 
Now, it just causes operations on this mutex to fail, or possibly causes a
deadlock among the contending user threads.

To solve this problem, do_unlock_pi should disown the umtx_pi even if the queue
of waiters is empty.

You are receiving this mail because:
You are the assignee for the bug.

More information about the freebsd-bugs mailing list