System() returning ECHILD error on FreeBSD 7.2

Wed Feb 10 17:52:07 UTC 2010

On Wed, Feb 10, 2010 at 9:25 AM, Naveen Gujje <gujjenaveen at gmail.com> wrote:
> Naveen Gujje <gujjenaveen at gmail.com
> <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>> wrote:
>  >> signal(SIGCHLD, SigChildHandler);
>  >>
>  >> void
>  >> SigChildHandler(int sig)
>
>  >> {
>  >>   pid_t pid;
>  >>
>  >>   /* get status of all dead procs */
>  >>   do {
>  >>     int procstat;
>  >>     pid = waitpid(-1, &procstat, WNOHANG);
>  >>     if (pid < 0) {
>
>  >>       if (errno == EINTR)
>  >>         continue;               /* ignore it */
>  >>       else {
>  >>         if (errno != ECHILD)
>  >>           perror("getting waitpid");
>
>  >>         pid = 0;                /* break out */
>  >>       }
>  >>     }
>  >>     else if (pid != 0)
>  >>       syslog(LOG_INFO, "child process %d completed", (int) pid);
>
>  >>   } while (pid);
>  >>
>  >>   signal(SIGCHLD, SigChildHandler);
>  >> }
>
>>There are several problems with your signal handler.
>
>>First, the perror() and syslog() functions are not re-entrant,
>
>>so they should not be used inside signal handlers.  This can
>>lead to undefined behaviour.  Please refer to the sigaction(2)
>>manual page for a list of functions that are considered safe
>>to be used inside signal handlers.
>
>>Second, you are using functions that may change the value of
>>the global errno variable.  Therefore you must save its value
>>at the beginning of the signal handler, and restore it at the
>>end.
>
>>Third (not a problem in this particular case, AFAICT, but
>>still good to know):  Unlike SysV systems, BSD systems do
>>_not_ automatically reset the signal action when the handler
>>is called.  Therefore you do not have to call signal() again
>
>>in the handler (but it shouldn't hurt either).  Because of
>>the semantic difference of the signal() function on different
>>systems, it is preferable to use sigaction(2) instead in
>>portable code.
>
> Okay, I followed your suggestion and changed my SigChildHandler to
>
> void
> SigChildHandler(int sig)
> {
>  pid_t pid;
>  int status;
>  int saved_errno = errno;
>
>  while (((pid = waitpid( (pid_t) -1, &status, WNOHANG)) > 0) ||
>
>         ((-1 == pid) && (EINTR == errno)))
>    ;
>
>  errno = saved_errno;
> }
>
> and used sigaction(2) to register this handler. Still, system(3) returns
> -1 with errno set to ECHILD.
>
>  >> And, in some other part of the code, we call system() to add an ethernet
>
>  >> interface. This system() call is returning -1 with errno set to ECHILD,
>  >> though the passed command is executed successfully.  I have noticed that,
>  >> the problem is observed only after we register SigChildHandler. If I have a
>
>  >> simple statement like system("ls") before and after the call to
>  >> signal(SIGCHLD, SigChildHandler), the call before setting signal handler
>  >> succeeds without errors and the call after setting signal handler returns -1
>
>  >> with errno set to ECHILD.
>  >>
>  >> Here, I believe that within the system() call, the child exited before the
>  >> parent got a chance to call _wait4 and thus resulted in ECHILD error.
>
>>I don't think that can happen.
>
>  >> But, for the child to exit without notifying the parent, SIGCHLD has to be
>  >> set to SIG_IGN in the parent and this is not the case, because we
> are already
>
>  >> setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling
>  >> system() then i don't see this problem.
>  >>
>  >> I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is
>
>  >> making the difference.
>
>>The system() function temporarily blocks SIGCHLD (i.e. it
>>adds the signal to the process' signal mask).  However,
>>blocking is different from ignoring:  The signal is held
>
>>as long as it is blocked, and as soon as it is removed
>>from the mask, it is delivered, i.e. your signal handler
>>is called right before the system() function returns.
>
> Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD
> is to give preference to wait4() of system() over any other waitXXX() in
> parent process. But I still cant get the reason for wait4() to return -1.
>
>>And since you don't save the errno value, your signal
>>handler overwrites the value returned from the system()
>>function.  So you get ECHILD.
>
> I had a debug print just after wait4() in system() and before we unblock
> SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD.

    Isn't this section of the system(3) libcall essentially doing what
you want, s.t. you'll never be able to get the process status when you
call waitpid(2)?

       do {
           pid = _wait4(savedpid, &pstat, 0, (struct rusage *)0);
       } while (pid == -1 && errno == EINTR);
       break;

    You typically get status via wait*(2) when using exec*(2) or via
the return codes from system(3), not system(3) with wait*(2)...
Thanks,
-Garrett