strace, holding sigacts lock over postsig(), et al.
truckman at FreeBSD.org
Thu Jan 8 04:11:04 PST 2004
On 7 Jan, Robert Watson wrote:
> Got a bug report this evening that the strace package hangs on 5-CURRENT.
> I'm able to confirm this; for those that don't know, strace makes
> extensive use of procfs. On attempting to reproduce it, I first got:
> crash2# strace ls
> Sleeping on "stopevent" with the following non-sleepable locks held:
> exclusive sleep mutex sigacts r = 0 (0xc20e2aa8) locked @
> lock order reversal
> 1st 0xc20e2aa8 sigacts (sigacts) @ kern/subr_trap.c:260
> 2nd 0xc20f1224 process lock (process lock) @ kern/kern_synch.c:309
> Stack backtrace:
> backtrace(c0864c4a,c20f1224,c0860e7b,c0860e7b,c0861ee5) at backtrace+0x17
> witness_lock(c20f1224,8,c0861ee5,135,c20f1224) at witness_lock+0x672
> _mtx_lock_flags(c20f1224,0,c0861edc,135,ffffffff) at _mtx_lock_flags+0xba
> msleep(c20f12e8,c20f1224,5c,c0865441,0) at msleep+0x794
> stopevent(c20f11b8,2,13,823,c0922200) at stopevent+0x85
> issignal(c1f31bd0,2,c08619f7,bd,1) at issignal+0x168
> cursig(c1f31bd0,0,c0864399,104,0) at cursig+0xe8
> ast(c9520d48) at ast+0x4b0
> doreti_ast() at doreti_ast+0x17
> load: 0.21 cmd: strace 583 [iowait] 0.00u 0.91s 0% 724k
> [sent a serial break]
> Cool, eh? Second try:
> crash2# strace ls
> execve(0xbfbfe890, [0xbfbfed54], [/* 0 vars */]PIOCWSTOP: Input/output
> Even better.
> The first obvious observation is that holding mutexes other than the
> process mutex over calls to _STOPEVENT() is a bad idea. It seems like the
> p_sig mutex is used to cover a fair amount of flag handling, signal entry
> changes, etc, etc. I'm not familiar with the semantic requirements here,
> but presumably something needs to change. Is it possible to release the
> locks after grabbing the value of 'action' (or even do a lock-free read),
> and then grab the sigact lock only later during actual delivery, yet
> maintain the right semantics?
In both issignal() and postsig() I think it would be safe to drop the
p_sig mutex before _STOPEVENT() and grab the mutex again afterwards.
About the only thing that can happen during the interim would be the
receipt of another signal and I don't think that would be a problem.
Dropping the mutex is how issignal() handles ptracestop() a bit further
down in the code.
More information about the freebsd-current