HEADS UP: UNIX domain socket locking changes merged to CVS HEAD
Robert Watson
rwatson at FreeBSD.org
Thu Mar 1 09:11:16 UTC 2007
On Wed, 28 Feb 2007, Scott Robbins wrote:
> On Wed, Feb 28, 2007 at 07:00:15PM -0500, Randall Stewart wrote:
>> Robert Watson wrote:
>>> On Wed, 28 Feb 2007, Stephane E. Potvin wrote:
>>>>
>>>> Since this commit, I've been observing frequent deadlocks on my laptop,
>>>> mostly when starting-up gnome. It usually takes less than 5 to 10 minutes for
>>>> the deadlock to happens.
>
> I too have been having unexpected lockups--like Randall, I figured it was
> something to do with my machine. Interestingly enough, though X will lock
> up completely (and I can't ssh to the machine, though I can ping it) the
> jail, which runs a small web site, running on an alias ip address continues
> to work--I can still access the web site from outside.
>
> However, I haven't been able to apply Robert's patch yet. As some of you
> have noticed, there's a bunch of tinderbox failures dying in netstat. It's
> happening to me too, so I haven't been able to rebuild.
>
> (this is more of a me too post at this point--I haven't had a chance to do
> any investigation).
Give uipc_usrreq.c:1.199 a try and see if it helps.
On the web server/jail vs X11 thing: yes -- deadlocks involving lock order
reversals typically affect two classes of threads. The first is threads that
are directly involved in the deadlock (the two reverse lock acquisitions), and
the second class is threads that end up waiting on any locks (or other
resources) held by the threads in the deadly embrace.
So X11 and a Gnome process deadlock, then other processes trying to talk to
X11 or the Gnome process get stuck waiting on them; any processes doing
operations requiring the global UNIX domain socket be writable hang (so
processes performing UNIX domain socket connect and bind). Processes that
don't go near X11/Gnome, and possibly UNIX domain sockets generally, will do
alright. However, I would think that new SSH sessions into the jail might
also hang since they will try to open new syslog sessions, which requires a
UNIX domain socket connect operation. The interrupt thread and netisr don't
involve UNIX domain sockets at all, and therefore run without a problem, as
does Apache, which has already established its UNIX domain sockets and has
nothing further to say on the topic.
These symptoms hold true of deadlocks, but also lock leaks, which are caused
by a slightly different issue (a missing unlock), but can lead to the same
cascading failure of dependent processes.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-current
mailing list