bin/108390: [libc] [patch] wait4() erroneously waits for all children when SIGCHLD is SIG_IGN [regression]

Jilles Tjoelker jilles at stack.nl
Fri Apr 10 15:30:03 PDT 2009


The following reply was made to PR bin/108390; it has been noted by GNATS.

From: Jilles Tjoelker <jilles at stack.nl>
To: bug-followup at FreeBSD.org
Cc:  
Subject: Re: bin/108390: [libc] [patch] wait4() erroneously waits for all
	children when SIGCHLD is SIG_IGN [regression]
Date: Sat, 11 Apr 2009 00:23:56 +0200

 Maybe this can be reopened as a change-request.
 
 The way wait4() has worked when SA_NOCLDWAIT is set has been a bit
 strange all since it was introduced in 1997 in
 http://svn.freebsd.org/viewvc/base?view=revision&revision=29340
 The comment there seems to agree with what POSIX says, which is not as
 useful as it could reasonably be:
                 /*
                  * If this was the last child of our parent, notify
                  * parent, so in case he was wait(2)ing, he will
                  * continue.
                  */
                 if (LIST_EMPTY(&pp->p_children))
                         wakeup(pp);
 
 However, if you pass wait4() a pid or pgid that matches no child
 processes, it returns ECHILD immediately, without waiting for any other
 children to terminate (see kern_wait() in kern_exit.c).
 
 Together with signal semantics this leads to strange results. A wait4()
 on a specific process will normally wait for all child processes to
 terminate, but if a signal is caught in the meantime, wait4() is
 restarted and returns ECHILD immediately if all processes matching the
 argument have already terminated. I have tried this by catching SIGALRM
 with an empty handler and calling alarm(2) before test_it("IGNORE
 SIGCHLD"); "P short child finished" happens after 2 seconds, "P waiting
 for long running child" happens after 8 more seconds.
 
 Another way this can lead to strange results is if something else wakes
 up the proc pointer. It seems this can happen if you use SA_NOCLDWAIT in
 a multithreaded process, and have one thread wait for a child process
 and another do a vfork(). The child process from the vfork will wake up
 the proc pointer to notify that it has execed, and this will wake up
 both the wait and the vfork. (By the way, vfork() blocking only the
 calling thread and not the entire process is rather weird considering
 the original reason for the blocking.) Possible related issue: what if
 the vfork child did not exec and the wakeup is suppressed; does this
 freeze the parent until all other children are gone?
 
 Perhaps there are other things that can wakeup the proc pointer?
 
 I think removing the  if (LIST_EMPTY(&pp->p_children))  condition and
 always doing the  wakeup(pp)  will yield a more consistent behaviour,
 which seems more useful for applications. wait4() with SA_NOCLDWAIT
 will then wait for all matching child processes to terminate and return
 ECHILD (unless there are still zombies left from a time when
 SA_NOCLDWAIT was not set). The behaviour described in POSIX is available
 by specifying any child process (-1).
 
 gavin reports that the test program works as the submitter wants
 (wait4(short_pid, ...) returns immediately after short_pid terminates)
 on Solaris 10.
 
 While doing this, I also noticed a bug in kern_wait(). Ptrace reparents
 a process to a debugger. When the process exits, the debugger will pick
 it up in kern_wait() which reparents the process back to its original
 parent and signals the original parent as for a normal exit. This code
 does not check for SA_NOCLDWAIT or SIGCHLD being set SIG_IGN, and will
 leave a zombie anyway.
 
 Note that the special handling for SIG_IGN for SIGCHLD in FreeBSD 5 and
 newer works pretty much the same way as SA_NOCLDWAIT, so it is not much
 of a factor in the above discussion.
 
 -- 
 Jilles Tjoelker


More information about the freebsd-bugs mailing list