bin/108390: [libc] [patch] wait4() erroneously waits for all
children when SIGCHLD is SIG_IGN [regression]
Jilles Tjoelker
jilles at stack.nl
Fri Apr 10 15:30:03 PDT 2009
The following reply was made to PR bin/108390; it has been noted by GNATS.
From: Jilles Tjoelker <jilles at stack.nl>
To: bug-followup at FreeBSD.org
Cc:
Subject: Re: bin/108390: [libc] [patch] wait4() erroneously waits for all
children when SIGCHLD is SIG_IGN [regression]
Date: Sat, 11 Apr 2009 00:23:56 +0200
Maybe this can be reopened as a change-request.
The way wait4() has worked when SA_NOCLDWAIT is set has been a bit
strange all since it was introduced in 1997 in
http://svn.freebsd.org/viewvc/base?view=revision&revision=29340
The comment there seems to agree with what POSIX says, which is not as
useful as it could reasonably be:
/*
* If this was the last child of our parent, notify
* parent, so in case he was wait(2)ing, he will
* continue.
*/
if (LIST_EMPTY(&pp->p_children))
wakeup(pp);
However, if you pass wait4() a pid or pgid that matches no child
processes, it returns ECHILD immediately, without waiting for any other
children to terminate (see kern_wait() in kern_exit.c).
Together with signal semantics this leads to strange results. A wait4()
on a specific process will normally wait for all child processes to
terminate, but if a signal is caught in the meantime, wait4() is
restarted and returns ECHILD immediately if all processes matching the
argument have already terminated. I have tried this by catching SIGALRM
with an empty handler and calling alarm(2) before test_it("IGNORE
SIGCHLD"); "P short child finished" happens after 2 seconds, "P waiting
for long running child" happens after 8 more seconds.
Another way this can lead to strange results is if something else wakes
up the proc pointer. It seems this can happen if you use SA_NOCLDWAIT in
a multithreaded process, and have one thread wait for a child process
and another do a vfork(). The child process from the vfork will wake up
the proc pointer to notify that it has execed, and this will wake up
both the wait and the vfork. (By the way, vfork() blocking only the
calling thread and not the entire process is rather weird considering
the original reason for the blocking.) Possible related issue: what if
the vfork child did not exec and the wakeup is suppressed; does this
freeze the parent until all other children are gone?
Perhaps there are other things that can wakeup the proc pointer?
I think removing the if (LIST_EMPTY(&pp->p_children)) condition and
always doing the wakeup(pp) will yield a more consistent behaviour,
which seems more useful for applications. wait4() with SA_NOCLDWAIT
will then wait for all matching child processes to terminate and return
ECHILD (unless there are still zombies left from a time when
SA_NOCLDWAIT was not set). The behaviour described in POSIX is available
by specifying any child process (-1).
gavin reports that the test program works as the submitter wants
(wait4(short_pid, ...) returns immediately after short_pid terminates)
on Solaris 10.
While doing this, I also noticed a bug in kern_wait(). Ptrace reparents
a process to a debugger. When the process exits, the debugger will pick
it up in kern_wait() which reparents the process back to its original
parent and signals the original parent as for a normal exit. This code
does not check for SA_NOCLDWAIT or SIGCHLD being set SIG_IGN, and will
leave a zombie anyway.
Note that the special handling for SIG_IGN for SIGCHLD in FreeBSD 5 and
newer works pretty much the same way as SA_NOCLDWAIT, so it is not much
of a factor in the above discussion.
--
Jilles Tjoelker
More information about the freebsd-bugs
mailing list