Core Dump / panic sleeping thread

Tue Mar 19 23:38:16 UTC 2013

Andriy Gapon wrote:
> on 19/03/2013 19:35 Jeremy Chadwick said the following:
> > On Tue, Mar 19, 2013 at 06:18:06PM +0100, Michael Landin Hostbaek
> > wrote:
> [snip]
> >> Unread portion of the kernel message buffer:
> >> Sleeping thread (tid 100256, pid 85641) owns a non-sleepable lock
> >> KDB: stack backtrace of thread 100256:
> >> #0 0xffffffff808f2d46 at mi_switch+0x186
> >> #1 0xffffffff8092bb52 at sleepq_wait+0x42
> >> #2 0xffffffff808f34d6 at _sleep+0x376
> >> #3 0xffffffff80b4f3ae at vm_object_page_remove+0x2ce
> >> #4 0xffffffff80b5ac7d at vnode_pager_setsize+0x17d
> >> #5 0xffffffff8082102c at nfscl_loadattrcache+0x2cc
> >> #6 0xffffffff80818d37 at nfs_getattr+0x287
> >> #7 0xffffffff8098f1c0 at vn_stat+0xb0
> >> #8 0xffffffff809869d9 at kern_statat_vnhook+0xf9
> >> #9 0xffffffff80986b55 at kern_statat+0x15
> >> #10 0xffffffff80986c1a at sys_lstat+0x2a
> >> #11 0xffffffff80bd7ae6 at amd64_syscall+0x546
> >> #12 0xffffffff80bc3447 at Xfast_syscall+0xf7
> >> panic: sleeping thread
> >> cpuid = 0
> >> KDB: stack backtrace:
> >> #0 0xffffffff809208a6 at kdb_backtrace+0x66
> >> #1 0xffffffff808ea8be at panic+0x1ce
> >> #2 0xffffffff8092ed22 at propagate_priority+0x1d2
> >> #3 0xffffffff8092fa4e at turnstile_wait+0x1be
> >> #4 0xffffffff808d8d48 at _mtx_lock_sleep+0xd8
> >> #5 0xffffffff80820fa4 at nfscl_loadattrcache+0x244
> >> #6 0xffffffff8081758c at ncl_readrpc+0xac
> >> #7 0xffffffff80824c45 at ncl_getpages+0x485
> >> #8 0xffffffff80b5aa0c at vnode_pager_getpages+0x9c
> >> #9 0xffffffff80b3fc93 at vm_fault_hold+0x673
> >> #10 0xffffffff80b41cc3 at vm_fault+0x73
> >> #11 0xffffffff80bd84b4 at trap_pfault+0x124
> >> #12 0xffffffff80bd8c6c at trap+0x49c
> >> #13 0xffffffff80bc315f at calltrap+0x8
> [snip]
> 
> I think that the regular mutex which is acquired via NFSLOCKNODE() in
> nfscl_loadattrcache() can not be held across vnode_pager_setsize.
> I am not sure though when vap->va_size != np->n_size case is
> triggered.
> 
Yep, I'd agree to that. The same bug is in the old NFS client and
the new NFS client cribbed the code from there.

I have attached a simple patch that unlocks the mutex for the
vnode_pager_setsize() call. Maybe you could test it?

Thanks for reporting this, rick
ps: Hopefully "patch" can apply this patch (there have been
    recent changes to this file, so the line#s could be off).
    It should be easy to do manually if not. The change is
    in nfscl_loadattrcache() in sys/fs/nfsclient/nfs_clport.c.

> > You're going to need to provide the following details:
> >
> > 1. Contents of /etc/rc.conf
> > 2. Contents of /etc/sysctl.conf (if modified)
> > 3. Contents of /etc/fstab
> > 4. ifconfig -a
> > 5. OS used by the NFS server, and all configuration details
> > pertaining
> > to that system
> >
> > You may also be asked to upgrade to 9.1-STABLE, as there may be
> > fixes
> > for whatever this is in base/stable/9 that are not in -RELEASE, but
> > this
> > is speculative on my part.
> >
> I do not see a need for any of these.
> 
> --
> Andriy Gapon
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
> "freebsd-stable-unsubscribe at freebsd.org"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loadattrcache.patch
Type: text/x-patch
Size: 403 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20130319/24bf36a3/attachment.bin>