process will not die.

Dan Nelson dnelson at allantgroup.com
Thu Sep 30 18:29:39 PDT 2004


In the last episode (Sep 30), Jason Barnes said:
> 	While running an mpirun job on my dual-processor SMP system
> (FreeBSD 4-STABLE from August 28), my program (initiated with the
> command line 'mpirun -np 2 ../sphagr') periodically dies, leaving a
> process that I can't kill -9.  Here's the top:
> 
> 	here's ps -auxw | grep sph:
> 
> jbarnes   549  0.0  8.7 410076 90744  p2  R     3:39PM   3:01.97 sphagr -p4pg /usr/home/
> jbarnes   550  0.0  0.0     0    0  p2  Z     3:39PM   0:00.00  (sphagr)
> 
> 	The 550 process I kill -9ed, but its still there, and now when I
> try to kill it it says 'no such process'.

Processes in the Z state have already exited, but their parent process
has not retrieved their status with one of the wait*() functions.  The
entry in the process table will stay until that happens.  You can run
"ps axlp 550" and look at the PPID column to determine the parent's
pid.  The parent code needs to either wait() for the child status, or
if it doesn't need to know when the child exits, ignore SIGCHLD or set
the SA_NOCLDWAIT flag with sigaction().

-- 
	Dan Nelson
	dnelson at allantgroup.com


More information about the freebsd-questions mailing list