Deadlock in the routing code

Maxime Henrion mux at FreeBSD.org
Wed Dec 19 04:09:49 PST 2007


Maxime Henrion wrote:
> Julian Elischer wrote:
> > Gleb Smirnoff wrote:
> > >On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote:
> > >J>  Maxime Henrion wrote:
> > >J> > Replying to myself on this one, sorry about that.
> > >J> > I said in my previous mail that I didn't know yet what process was
> > >J> > holding the lock of the rtentry that the routed process is dealing
> > >J> > with in rt_setgate(), and I just could verify that it is held by
> > >J> > the swi1: net thread.
> > >J> > So, in a nutshell:
> > >J> > - The routed process does its business on the routing socket, that 
> > >ends up
> > >J> >   calling rt_setgate().  While in rt_setgate() it drops the lock on 
> > >its
> > >J> >   rtentry in order to call rtalloc1().  At this point, the routed
> > >J> >   process hold the gateway route (rtalloc1() returns it locked), and 
> > >it
> > >J> >   now tries to re-lock the original rtentry.
> > >J> > - At the same time, the swi net thread calls arpresolve() which ends 
> > >up
> > >J> >   calling rt_check().  Then rt_check() locks the rtentry, and tries to
> > >J> >   lock the gateway route.
> > >J> > A classical case of deadlock with mutexes because of different locking
> > >J> > order.  Now, it's not obvious to me how to fix it :-).
> > >J> 
> > >J>  On failure to re-lock, the routed call to rt_setgate should completely 
> > >abort J>  and restart from scratch, releasing all locks it has on the way 
> > >out.
> > >
> > >Do you suggest mtx_trylock?
> > 
> > I think that would be the cleanest way..
> 
> So, here's what I've got.  I have yet to test it at all, I hope that
> I'll be able to do so today, or tomorrow.  Any input appreciated.

It appears that this patch fixed the problem.  My gateway server
now has a nearly two days uptime, whereas previously it would have
probably crashed already.  I'm attaching the final version of the
patch here, since the last one had build-time errors.  I'm going
to commit this in HEAD soon unless someone has an objection for it.

Cheers,
Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rt_setgate.patch
Type: text/x-diff
Size: 1138 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20071219/01dd255c/rt_setgate.bin


More information about the freebsd-net mailing list