[Bug 233646] Flakey test case: bin.sh.builtins.functional_test.kill1

Thu Dec 27 22:45:20 UTC 2018

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233646

Jilles Tjoelker <jilles at FreeBSD.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|New                         |Open

--- Comment #3 from Jilles Tjoelker <jilles at FreeBSD.org> ---
In the below text, wait(2) means any wait system call; sh(1) uses wait3() which
appears as wait4() in ktrace.

The test case is meant to test that a terminated, wait(2)ed for but not
wait(1)ed for job can be passed to kill(1) without error (the command will do
nothing). The part with the second background job, p2 and wait is intended to
wait for the first background job to terminate and be wait(2)ed for, without
taking excessive time or wait(1)ing for it (which would make the %1
specification invalid). If the first background job is slow to terminate, the
kill command will do something but this is harmless. If the first background
job terminates but the kernel has not returned it yet via wait(2), the kill
command will kill a zombie which per POSIX does nothing successfully.

I noticed that the problem is quickly reproduced on head using a loop like
  while sh builtins/kill1.0; do :; done
using head's sh as well as stable/11's sh, while it can run for quite a while
on stable/11 using stable/11's sh as well as head's sh built against stable/11.

Reproducing with ktrace -i seems hard, but reproducing with plain ktrace works.
The below ktrace extract seems to indicate that the kernel is at fault,
returning an [ESRCH] error for killing a zombie:

 19837 sh       CALL  fork
 19837 sh       RET   fork 19838/0x4d7e
 19837 sh       CALL  wait4(0xffffffff,0x7fffffffe91c,0x1<WNOHANG>,0)
 19837 sh       RET   wait4 0
 19837 sh       CALL  fork
 19837 sh       RET   fork 19839/0x4d7f
 19837 sh       CALL  sigprocmask(SIG_BLOCK,0x7fffffffe820,0x7fffffffe810)
 19837 sh       RET   sigprocmask 0
 19837 sh       CALL  sigaction(SIGCHLD,0x7fffffffe850,0x7fffffffe830)
 19837 sh       RET   sigaction 0
 19837 sh       CALL  wait4(0xffffffff,0x7fffffffe80c,0x1<WNOHANG>,0)
 19837 sh       RET   wait4 19839/0x4d7f
 19837 sh       CALL  sigaction(SIGCHLD,0x7fffffffe830,0)
 19837 sh       RET   sigaction 0
 19837 sh       CALL  sigprocmask(SIG_SETMASK,0x7fffffffe810,0)
 19837 sh       RET   sigprocmask 0
 19837 sh       CALL  kill(0x4d7e,SIGTERM)
 19837 sh       RET   kill -1 errno 3 No such process

Process ID 18007 has not been returned by a wait4() call, so it must either be
still running or a zombie. In either case, a kill() on it must succeed.

It appears that there is no test that specifically verifies that killing a
zombie process succeeds.

-- 
You are receiving this mail because:
You are the assignee for the bug.