duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c

Stephan Uphoff ups at tree.com
Sat Aug 20 23:02:30 GMT 2005


On Thu, 2005-08-18 at 10:26, John Baldwin wrote:
> On Thursday 18 August 2005 10:18 am, Stephan Uphoff wrote:
> > On Thu, 2005-08-18 at 03:57, Luigi Rizzo wrote:
> > > On Thu, Aug 18, 2005 at 03:32:19AM +0200, Max Laier wrote:
> > > > On Thursday 18 August 2005 02:02, Luigi Rizzo wrote:
> > >
> > > ...
> > >
> > > > > could you guys look at the following code and see if it makes sense,
> > > > > or tell me where i am wrong ?
> > > > >
> > > > > It should solve the starvation and blocking trylock problems,
> > > > > because the active reader does not hold the mutex in the critical
> > >
> > >                        ^^^^^^
> > >
> > > i meant 'writer', sorry... as max said even in the current implementation
> > > the reader does not hold the lock.
> > >
> > > > > int
> > > > > RLOCK(struct rwlock *rwl, int try)
> > > > > {
> > > > >         if (!try)
> > > > >                 mtx_lock(&rwl->m);
> > > > >         else if (!mtx_trylock(&rwl->m))
> > > > >                 return EBUSY;
> > > > >         if (rwl->writers == 0)  /* no writer, pass */
> > > > >                 rwl->readers++;
> > > > >         else {
> > > > >                 rwl->br++;
> > > > >                 cv_wait(&rwl->qr, &rwl->m);
> > > >
> > > >                   ^^^^^^^
> > > >
> > > > That we can't do.  That's exactly the thing the existing sx(9)
> > > > implementation does and where it breaks.  The problem is that cv_wait()
> > > > is an implicit sleep which breaks when we try to RLOCK() with other
> > > > mutex already acquired.
> > >
> > > but that is not a solvable problem given that the *LOCK may be blocking.
> > > And the cv_wait is not an unconditioned sleep, it is one where you
> > > release the lock right before ans wait for an event to wake you up.
> > > In fact i don't understand why you consider spinning and sleeping
> > > on a mutex two different things.
> >
> > The major difference between sleeping (cv_wait,msleep,..) and blocking
> > on a mutex is priority inheritance.
> > If you need to be able to use (non-spin) mutexes while holding a
> > [R|W]LOCK and use a [R|W]LOCK while holding a (non-spin) mutex then you
> > need to implement priority inheritance for [R|W]LOCKs.
> > For the (single) write lock holder tracking priority is easy. (just like
> > a (non-spin) mutex). However priority inheritance to multiple readers is
> > more difficult as one needs to keep track of all holders of the lock.
> > Keeping track of all readers requires pre-allocated memory resources.
> > This memory could come from
> > 	1) A limited global pool
> > 	2) A limited per [R|W]LOCK pool
> > 	3) A limited per thread pool
> > 	4) As a parameter for acquiring a RLOCK
> > None of the choices are really pretty.
> > (1),(2) and (3) can lead to limiting reader parallelism when running out
> > of resources. (4) may be practically for some cases since the memory
> > could be allocated from stack. However since the memory must be valid
> > while holding a read lock choice (4) makes some algorithms (than use for
> > example lock crabbing) a bit harder to implement.
> 
> Solaris handles the read case by only tracking the first thread to get a read 
> lock (referred to as the "owner of record" IIRC) and only propagating 
> priority to that thread and ignoring other readers.  They admit it's not 
> perfect as well.  That's mentioned in the Solaris Internals book.

Hi John,

If I recall correctly Solaris user threads get a better "system
priority" when running/blocking in the kernel. This prevents threads
holding critical kernel resources from being starved by user processes.
I think this would also be a good idea for FreeBSD to prevent unbound
priority inversions caused by vnode locks and other non-tracked
resources. However even with these changes I don't think the Solaris
implementation is the right way to go if we want to freely mix mutexes
and [R|W]locks.


Stephan   




More information about the freebsd-net mailing list