Core Dump / panic sleeping thread

Tue Mar 19 18:28:12 UTC 2013

On Tue, Mar 19, 2013 at 07:45:56PM +0200, Andriy Gapon wrote:
> on 19/03/2013 19:35 Jeremy Chadwick said the following:
> > On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek wrote:
> [snip]
> >> Unread portion of the kernel message buffer:
> >> Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock
> >> KDB: stack backtrace of thread 100256:
> >> #0 0xffffffff808f2d46 at mi_switch+0x186
> >> #1 0xffffffff8092bb52 at sleepq_wait+0x42
> >> #2 0xffffffff808f34d6 at _sleep+0x376
> >> #3 0xffffffff80b4f3ae at vm_object_page_remove+0x2ce
> >> #4 0xffffffff80b5ac7d at vnode_pager_setsize+0x17d
> >> #5 0xffffffff8082102c at nfscl_loadattrcache+0x2cc
> >> #6 0xffffffff80818d37 at nfs_getattr+0x287
> >> #7 0xffffffff8098f1c0 at vn_stat+0xb0
> >> #8 0xffffffff809869d9 at kern_statat_vnhook+0xf9
> >> #9 0xffffffff80986b55 at kern_statat+0x15
> >> #10 0xffffffff80986c1a at sys_lstat+0x2a
> >> #11 0xffffffff80bd7ae6 at amd64_syscall+0x546
> >> #12 0xffffffff80bc3447 at Xfast_syscall+0xf7
> >> panic: sleeping thread
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> #0 0xffffffff809208a6 at kdb_backtrace+0x66
> >> #1 0xffffffff808ea8be at panic+0x1ce
> >> #2 0xffffffff8092ed22 at propagate_priority+0x1d2
> >> #3 0xffffffff8092fa4e at turnstile_wait+0x1be
> >> #4 0xffffffff808d8d48 at _mtx_lock_sleep+0xd8
> >> #5 0xffffffff80820fa4 at nfscl_loadattrcache+0x244
> >> #6 0xffffffff8081758c at ncl_readrpc+0xac
> >> #7 0xffffffff80824c45 at ncl_getpages+0x485
> >> #8 0xffffffff80b5aa0c at vnode_pager_getpages+0x9c
> >> #9 0xffffffff80b3fc93 at vm_fault_hold+0x673
> >> #10 0xffffffff80b41cc3 at vm_fault+0x73
> >> #11 0xffffffff80bd84b4 at trap_pfault+0x124
> >> #12 0xffffffff80bd8c6c at trap+0x49c
> >> #13 0xffffffff80bc315f at calltrap+0x8
> [snip]
> 
> I think that the regular mutex which is acquired via NFSLOCKNODE() in
> nfscl_loadattrcache() can not be held across vnode_pager_setsize.
> I am not sure though when vap->va_size != np->n_size case is triggered.

When the file is modified on the server outside of the control of
the client ? E.g., by direct access on the server, or from the other
client.

The only possible solution is to move the vnode_pager_setsize() outside
the scope of the n_mtx. This is somewhat problematic because the nfsiod
threads never bother to lock the vnode, so the truncation of the vm
cache becomes racy. Still, this is probably the best cure.

Another issue I see there is that vnode_pager_setsize() call is only
performed for the VREG nodes. I believe that it is possible to cache
the pages for the directories as well.

Would you work out the patch ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130319/c6ece76e/attachment.sig>