sleeping thread

Robert Watson rwatson at FreeBSD.org
Fri Feb 23 15:05:15 UTC 2007


On Thu, 22 Feb 2007, Pramod Srinivasan wrote:

> I am coming across a weird issue with FreeBSD 6.1, any help appreciated.
>
> The problem is the following:
>
> One thread (1) does a setsockopt, grabs a lock in udp_usrreq, calls copyin 
> which hits a pagefault, this leads to that thread sleeping by calling 
> msleep.

Performing copying/copyout while holding a mutex, especially one also required 
from interrupt or software interrupt context, is a bug for precisely the 
reason you describe here: interrupt context can get blocked waiting on an 
unbounded operation such as a disk read.  However, I'm slightly confused by 
your stacktrace: in FreeBSD 6.1, there is no udp_ctloutput(), only a 
udp_ctlinput().  This aside, there have been a number of problems relating to 
ip_ctloutput() holding locks over calls to copy in and out socket buffer 
arguments.  I believe these are mostly fixed in 7.x, and actually largely 
fixed in 6.x, although possibly after 6.1.  The basic approach to fixing this 
is to either not acquire the locks until after the copy operation, or release 
them before the operation.  This turns out to be a bit tricky, because certain 
pointers remain stable only while the locks are held, so the pointers may need 
to be re-derived or re-validated with the locks acquired.

Diffing ip_ctloutput.c between 6.1 and 6.2 will most likely give you a sense 
of what is required:

http://fxr.watson.org/fxr/diff/netinet/ip_output.c?v=RELENG61;diffval=RELENG62;diffvar=v

In particular, look at the comment at the start of ip_ctloutput().

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> msleep(f01cd488,c09fe6a0,44,c0956c79,0) at
> bwait(f01cd488,44,c0956c79) at
> vnode_pager_input_smlfs(c10487bc,c2740ae0,0,1,fcd6d918) at
> vnode_pager_generic_getpages(ccd0bcf0,fcd6da50,1000,0,fcd6d978) at
> vop_stdgetpages(fcd6d98c) at
>
> Another thread (netisr) which is processing some udp packet tries to
> grab the same lock but since it's already held by thread 1, tries to
> propagate the priority and panics because there is a check in the code
> in propagate_priority which causes the panic
>
> 		/*
> 		 * If the thread is asleep, then we are probably about
> 		 * to deadlock.  To make debugging this easier, just
> 		 * panic and tell the user which thread misbehaved so
> 		 * they can hopefully get a stack trace from the truly
> 		 * misbehaving thread.
> 		 */
> 		if (TD_IS_SLEEPING(td)) {
> 			printf(
> 		"Sleeping thread (tid %d, pid %d) owns a non-sleepable
> lock\n",
> 			    td->td_tid, td->td_proc->p_pid);
> #ifdef DDB
> 			db_trace_thread(td, -1);
> #endif
> 			panic("sleeping thread");
> 		}
>
> Below is the output with witness turned on....
>
> Not sure how to go forward with this, any pointers?
>
> Thanks,
> Pramod
>
> lock order reversal: (sleepable after non-sleepable)
> 1st 0xc0a20a8c udp (udp) @ src/sys/netinet/udp_usrreq.c:1523
> 2nd 0xccdbee54 user map (user map) @ src/sys/vm/vm_map.c:3005
> KDB: stack backtrace:
> kdb_backtrace(0,ffffffff,c09c1b40,c09c16e0,c0978c6c) at
> witness_checkorder(ccdbee54,9,c09305b4,bbd) at
> _sx_xlock(ccdbee54,c09305a8,bbd) at
> _vm_map_lock_read(ccdbee10,c09305a8,bbd,1d6d9b4,ccdd76a8) at
> vm_map_lookup(fcd6da40,8097000,1,fcd6da44,fcd6da34) at
> vm_fault(ccdbee10,8097000,1,0,ccdd5000) at
> trap_pfault(fcd6db08,0,8097940) at
> trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at
> calltrap() at
> --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 ---
> slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at
> ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at
> udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at
> sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at
> kern_setsockopt(ccdd5000,6,0,6d,8097940) at
> setsockopt(ccdd5000,fcd6dd04,5,2,292) at
> syscall(3b,3b,3b,0,7a6c) at
> Xint0x80_syscall() at
> --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp =
> 0xbfbfddec, ebp = 0xbfbfde48 ---
> Acquiring lockmgr lock "isofs" with the following non-sleepable locks
> held:
> exclusive sleep mutex udp r = 0 (0xc0a20a8c) locked @
> src/sys/netinet/udp_usrreq.c:1523
> KDB: stack backtrace:
> kdb_backtrace(1,1,1,3041,ccd0bd6c) at
> witness_warn(5,c09b211c,c09416bb,c09406a1) at
> lockmgr(ccd0bd48,3041,ccd0bd6c,ccdd5000,fcd6d918) at
> vop_stdlock(fcd6d938,3041,ccd0bcf0,fcd6d954,c058e20c) at
> VOP_LOCK_APV(c09704e0,fcd6d938) at
> vn_lock(ccd0bcf0,3041,ccdd5000,ccd0bcf0,3041) at
> vget(ccd0bcf0,3041,ccdd5000) at
> vnode_pager_lock(ccdf1840,ccdf1840,ccdf1840,0,c0930058) at
> vm_fault(ccdbee10,8097000,1,0,ccdd5000) at
> trap_pfault(fcd6db08,0,8097940) at
> trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at
> calltrap() at
> --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 ---
> slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at
> ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at
> udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at
> sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at
> kern_setsockopt(ccdd5000,6,0,6d,8097940) at
> setsockopt(ccdd5000,fcd6dd04,5,2,292) at
> syscall(3b,3b,3b,0,7a6c) at
> Xint0x80_syscall() at
> --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp =
> 0xbfbfddec, ebp = 0xbfbfde48 ---
> Sleeping on "vnsrd" with the following non-sleepable locks held:
> exclusive sleep mutex udp r = 0 (0xc0a20a8c) locked @
> src/sys/netinet/udp_usrreq.c:1523
> KDB: stack backtrace:
> kdb_backtrace(1,1,1,ccdd763c,ccdd5000) at
> witness_warn(5,c09fe6a0,c0942b95,c0956c79) at
> msleep(f01cd488,c09fe6a0,44,c0956c79,0) at
> bwait(f01cd488,44,c0956c79) at
> vnode_pager_input_smlfs(c10487bc,c2740ae0,0,1,fcd6d918) at
> vnode_pager_generic_getpages(ccd0bcf0,fcd6da50,1000,0,fcd6d978) at
> vop_stdgetpages(fcd6d98c) at
> VOP_GETPAGES_APV(c09704e0,fcd6d98c) at
> vnode_pager_getpages(c10487bc,fcd6da50,1,0) at
> vm_fault(ccdbee10,8097000,1,0,ccdd5000) at
> trap_pfault(fcd6db08,0,8097940) at
> trap(fcd60008,ccdd0028,28,fcd6db94,8097940) at
> calltrap() at
> --- trap 0xc, eip = 0xc08a5e06, esp = 0xfcd6db48, ebp = 0xfcd6db68 ---
> slow_copyin(fcd6dc88,fcd6db94,4,4,fcd6db98) at
> ip_ctloutput(ccdfd4ec,fcd6dc88,0,c054f464,0) at
> udp_ctloutput(ccdfd4ec,fcd6dc88,246,c0977524,ccdf3c2c) at
> sosetopt(ccdfd4ec,fcd6dc88,ccde5090,1,0) at
> kern_setsockopt(ccdd5000,6,0,6d,8097940) at
> setsockopt(ccdd5000,fcd6dd04,5,2,292) at
> syscall(3b,3b,3b,0,7a6c) at
> Xint0x80_syscall() at
> --- syscall (105, FreeBSD ELF32, setsockopt), eip = 0x881d1787, esp =
> 0xbfbfddec, ebp = 0xbfbfde48 ---
> Sleeping thread (tid 100087, pid 4302) owns a non-sleepable lock
> panic: sleeping thread
> db_log_stack_trace_cmd(c09b2de0) at 0
> panic(c0943cde,c08f8bec,186f7,10ce,c09b2ae0) at 0
> propagate_priority(cc727c00,c09b6bf0,c0a20a8c,cc727c00,c09089b0) at 0
> turnstile_wait(c0a20a8c,ccdd5000,c0a20a8c,2,c08f60ec,225) at 0
> _mtx_lock_sleep(c0a20a8c,cc727c00,0,c09089b0,10c) at 0
> _mtx_lock_flags(c0a20a8c,0,c09089b0,10c,0) at 0
> udp_input(ccd34b00,14,0,4,4) at 0
> ip_input(cca7d180,ccd34b00,1,c08f60ec,c0a1be10) at 0
> netisr_processqueue(c0a1b738) at 0
> swi_net(0) at 0
> ithread_execute_handlers(cc726428,cc724500) at 0
> ithread_loop(cc70e740,f8de1d38,cc70e740,c051a45a,0) at 0
> fork_exit(c051a45a,cc70e740,f8de1d38) at 0
> fork_trampoline() at 0
> --- trap 0x1, eip = 0, esp = 0xf8de1d6c, ebp = 0 ---
> KDB: enter: panic
> [thread pid 14 tid 100002 ]
> Stopped at      kdb_enter+0x37: pushl   $-0x1
> db>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>


More information about the freebsd-net mailing list