locked vnode / nfs... requires kill -9 in ddb

John E Hein jhein at timing.com
Thu Oct 19 20:14:04 UTC 2006


Kostik Belousov wrote at 13:04 +0300 on Oct 19, 2006:
 > On Wed, Oct 18, 2006 at 10:01:45AM -0600, John E Hein wrote:
 > > 6.2-PRERELEASE from 20061016 RELENG_6 sources.
 > > Locked vnodes
 > >  
 > > 0xc6b7bdd0: tag nfs, type VDIR
 > >     usecount 2, writecount 0, refcount 8 mountedhere 0
 > >     flags (VV_ROOT)
 > >     v_object 0xc9d84108 ref 0 pages 0
 > >      lock type nfs: EXCL (count 1) by thread 0xc8adac00 (pid 50746) with 5 pending
 > >         fileid 8 fsid 0x300ff06
 > > 
 > > 50746 50000 49999   600  T+                          sh
 > >  .
 > >  .
 > > db>db> trace 50746
 > > Tracing pid 50746 tid 100231 td 0xc8adac00
 > > sched_switch(c8adac00,0,2) at 0xc05ce0cb = sched_switch+0x173
 > > mi_switch(2,0) at 0xc05c2b0a = mi_switch+0x1ba
 > > thread_suspend_check(1,c079e04c,c8adac00,c9206b80,1,...) at 0xc05c722d = thread_suspend_check+0x191
 > > sleepq_catch_signals(c9206b80) at 0xc05db93f = sleepq_catch_signals+0x103
 > > sleepq_wait_sig(c9206b80) at 0xc05dbd96 = sleepq_wait_sig+0xe
 > > msleep(c9206b80,c08a6a40,153,c0813379,0) at 0xc05c2652 = msleep+0x25a
 > > nfs_reply(c9206b80,0,c8adac00,4,c7ea7100,...) at 0xc06c33ac = nfs_reply+0x244
 > > nfs_request(c6b7bdd0,c6ae2d00,1,c8adac00,c7815280,e8f3488c,e8f34890,e8f34894,c8adac00,e8f348a0) at 0xc06c40a5 = nfs_request+0x3c1
 > > nfs_getattr(e8f348dc) at 0xc06c912b = nfs_getattr+0x11f
 > > VOP_GETATTR_APV(c086c700,e8f348dc) at 0xc07b260c = VOP_GETATTR_APV+0x38
 > > nfsspec_access(e8f34a8c,c6bf7c94,0,e8f349a4,c060ca26,...) at 0xc06cebf1 = nfsspec_access+0x85
 > > nfs_access(e8f34a8c) at 0xc06c8b7a = nfs_access+0x122
 > > VOP_ACCESS_APV(c086c700,e8f34a8c) at 0xc07b25b0 = VOP_ACCESS_APV+0x38
 > > nfs_lookup(e8f34b18) at 0xc06c96ff = nfs_lookup+0xd3
 > > VOP_LOOKUP_APV(c086c700,e8f34b18) at 0xc07b22f7 = VOP_LOOKUP_APV+0x43
 > > lookup(e8f34c00) at 0xc060ee79 = lookup+0x4c1
 > > namei(e8f34c00) at 0xc060e71a = namei+0x39a
 > > kern_stat(c8adac00,806712c,0,e8f34c74) at 0xc061d3cd = kern_stat+0x35
 > > stat(c8adac00,e8f34d04) at 0xc061d37b = stat+0x1b
 > > syscall(3b,3b,3b,1,80670ec,...) at 0xc07a9363 = syscall+0x2bf
 > > Xint0x80_syscall() at 0xc079456f = Xint0x80_syscall+0x1f
 > > --- syscall (188, FreeBSD ELF32, stat), eip = 0x28196477, esp = 0xbfbfdc1c, ebp = 0xbfbfdcb8 ---
 > > db> kill 9 50746
 > > db> c
 > 
 > The nfs_reply is sleeping with the PCATCH set. The question is why SIGTSTP
 > does not cause msleep to return with EINTR.

Last night, I think it happened again but I can't tell if it was
exactly the same issue since by the time I got in, the box was locked
up hard.

This time it happened without any ctrl-z.

I had an automated script (runs from cron) that seemed to trigger a
live lock kind of problem.  I don't know that there was a locked
vnode, but a post mortem seems to indicate the system was behaving
similarly.

An hour later I started getting these messages:

Oct 18 22:07:25 gromit kernel: nfs server pid659 at gromit:/h: not responding
Oct 18 22:07:56 gromit kernel: nfs server pid659 at gromit:/h: not responding
Oct 18 22:10:00 gromit last message repeated 4 times
Oct 18 22:20:20 gromit last message repeated 20 times
Oct 18 22:30:09 gromit last message repeated 19 times
Oct 18 22:39:58 gromit last message repeated 19 times
Oct 18 22:50:18 gromit last message repeated 20 times

... then silence.

Apparently syslogd was well enough to let some information about the
condition trickle out.

I have built a kernel with WITNESS & INVARIANTS to see if I can get
any more information.


More information about the freebsd-stable mailing list