Sleeping thread owns a nonsleepable lock panic (& lor)

Kostik Belousov kostikbel at gmail.com
Wed Jul 27 12:08:58 UTC 2011


On Tue, Jul 26, 2011 at 07:12:23PM -0400, Rick Macklem wrote:
> Kostik Belousov wrote:
> > On Tue, Jul 26, 2011 at 01:17:52PM +0200, Herve Boulouis wrote:
> > > Le 26/07/2011 12:06, Kostik Belousov a Иcrit:
> > > > On Tue, Jul 26, 2011 at 11:49:13AM +0200, Herve Boulouis wrote:
> > > > > Le 25/07/2011 11:59, Kostik Belousov a ?crit:
> > > > >
> > > > > Ok the patched server crashed this morning strangely : all httpd
> > > > > processes were stuck in nfs or vmopar
> > > > > and were unkillable. Below is the full ps.
> > > >
> > > > Please see the
> > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html
> > > > for information required to debug the deadlocks.
> > >
> > > the box was not stricly deadlocked since I was able to interact with
> > > it but I suppose you want me to
> > > break into debugger when the symptoms appears again and report all
> > > the commands listed in the handbook
> > > deadlock section ?
> > 
> > Exactly.
> > 
> > I think everything was hung that accessed an nfs mount point.
> > From the usermode, procstat -kk could catch some interesting
> > information,
> > but it is redundant if ddb output is captured.
> 
> Would it be worth considering reverting r223054?
> (Note that I don't understand the VM side, so this may be completely
>  wrong:-)
> 
> The sleeps on vmopar could be happening because a dirty page is busy
> and r223054 changes the VM_PAGER_xx value set a couple of ways.
> 1 - When it returns VM_PAGER_ERROR instead of VM_PAGER_AGAIN, the
>     return value of "runlen" from vm_pageout_flush() changes.
> 2 - I'm not sure, but I think the pre-r223054 code marked a partially
>     written page as VM_PAGER_OK instead of VM_PAGER_AGAIN?
>     (I'm wondering about this one, since the problem seems to happen
>      when the file's size has been truncated.)
> 
> Herve Boulouis, if you want to see what r223054 changes, just go to
>   http://svn.freebsd.org/viewvc/stable/8/sys/nfsclient
>   and then click on nfs_bio.c.
>   (The changes are small and could easily be reverted with a manual
>    edit.)
> 
> Since r223054 went into stable/8 on Jun 13, it seems a possible
> explanation? rick

I doubt it. The ps output makes it not very inplausible that the
reporter got the LOR between vnode lock and page busy flag. The correct
order is vnode lock -> busy bit. vmopar is a wait for the busy page
state.

Mentioned revision does not change the lock order.

Anyway, this is only a speculation, until the requested data is provided.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110727/fc86b27e/attachment.pgp


More information about the freebsd-stable mailing list