misc/24641: pthread_rwlock_rdlock can deadlock

Mon Jan 5 17:45:37 PST 2004

On Mon, 5 Jan 2004, Earl Chew wrote:

> Daniel Eischen wrote:
> > POSIX says that:
> > 
> >   If the Thread Execution Scheduling option is supported, and
> >   the threads involved in the lock are executing with the scheduling
> >   policies SCHED_FIFO or SCHED_RR, the calling thread shall not
> >   acquire the lock if a writer holds the lock or if writers of
> >   higher or equal priority are blocked on the lock; otherwise,
> >   the calling thread shall acquire the lock.
> 
> Call this P1.
> 
> > Forget about the writer's priority for a moment; we don't
> > currently honor that.  And FYI, SCHED_OTHER in FreeBSD is
> > SCHED_RR.
> > 
> > POSIX also says that:
> > 
> >   A thread may hold multiple concurrent read locks on rwlock (that
> >   is, successfully call the pthread_rwlock_rdlock() function n
> >   times). If so, the application shall ensure that the thread
> >   performs matching unlocks (that is, it calls the
> >   pthread_rwlock_unlock() function n times).
> 
> Call this P2.
> 
> I interpret P2 as requiring rdlock() to be recursive. That is, once
> a thread acquires a rdlock(), it can do so repeatedly, as long as it
> performs the correct number of matching unlocks().
> 
> I interpret P1 as giving wrlock()s preference over rdlocks() for
> a particular lock.
> 
> > It isn't clear to me that this means that a subsequent rdlock
> > request while there are writers blocked on the lock succeeds.
> 
> Hmm... reading P2 again, I can see how you could be right.
> 
> However, it appears counter-intuitive to me that if a thread has already
> acquired a rdlock(), and the rdlock() is recursive, that a subsequent
> call should fail.
> 
> Failure should indicate that the lock could not be acquired. But the
> thread _already_ has the lock! So it's hard to explain why the
> subsequent call should fail (given the requirement that the
> rdlock be recursive).

It shouldn't fail; it should deadlock.

> > It may seem trivial to implement this, but when you consider
> > that there may be a lot of readers then it is not so trivial.
> > In order to implement it, you need to know whether a thread
> > owns the rdlock and have a recurse count for it.
> 
> Hmm... doesn't P2 already require you to support recursion?

The implementation just keeps one count of total readers (i.e.,
recursion by a group of readers); this is stored in the rwlock.
Recursive rdlocks by the same thread count as multiple readers.
There is no per-thread counter.  For recursive _mutexes_ this
is simple because there can be only one owner so the one recurse
count stored in the mutex is sufficient.  But for rwlocks there
can be lots of readers, and each reader may hold multiple rwlocks.
You either need a queue to hold the rwlock status (pointer to
rwlock and recurse count) for each owned rwlock in each thread,
or you need a queue of thread status (pointer to thread and
recurse count) for each owning thread in each rwlock.  And
regardless of where you store the queue, you need to seach it
to see if the thread already owns the rdlock (and then again
on the rdunlock).

Think about it.  What if you have 100 threads rdlock recursing
on 1 rwlock?  How do you tell if threads already own a rwlock?
And how many times have they recursed on the rwlock?  And what
if threads want to own multiple rwlocks?

Our current implementation is rather simple, but it does
avoid us from having to build in limits as to how many
rwlocks can be owned by a thread or how many threads can
own a rwlock.  I'll add this to my TODO list for libkse
but it will come at a bit of a performance & storage
cost and there will have to be some built-in limits.

> > And threads
> > may own multiple read locks.  You would have to track all
> > of them and it would add a bit of overhead having to search
> > the owned read locks (and they don't have to be taken and
> > released in order so you'd have to search the entire queue).
> 
> I don't see how either P1 or P2 requires this behaviour. What am
> I missing?

See above :)