7.0-stable: a hung process - scheduler bug?

Mikhail Teterin mi+mill at aldan.algebra.com
Tue Sep 23 17:48:01 UTC 2008


Hello!

I was trying to build OpenOffice using all of my 4 CPUs. To be able to 
do other work on the machine comfortably, I ran the build under nice, 
and assigned real-time priority to the two Xorg processes.
The build started at about 23:10 last night, and hung at 23:46. The 
procstat output for the make's process group is:

      PID  PPID  PGID   SID  TSID THR LOGIN    WCHAN     EMUL         
    COMM       
     8371  2425  8371  2425  2425   1 mi       wait      FreeBSD ELF64 make
    12254  8371  8371  2425  2425   1 mi       wait      FreeBSD ELF64 sh
    12255 12254  8371  2425  2425   1 mi       pause     FreeBSD ELF64
    tcsh 
    12262 12255  8371  2425  2425   1 mi       wait      FreeBSD ELF64
    perl5.8.8
    33010 12262  8371  2425  2425   1 mi       wait      FreeBSD ELF64
    perl5.8.8
    33011 33010  8371  2425  2425   1 mi       wait      FreeBSD ELF64 sh
    33012 33011  8371  2425  2425   1 mi       wait      FreeBSD ELF64 dmake
    37126 33012  8371  2425  2425   1 mi       -         FreeBSD ELF64 dmake

The last line worries me greatly... According to "procstat -t", there is 
only one thread there:

      PID    TID COMM             TDNAME           CPU  PRI STATE  
    WCHAN   
    37126 100724 dmake            -                  1  193 sleep   -

And trying to "ktrace -p 37126" returns (even to root, even in /tmp):

    ktrace: ktrace.out: Operation not permitted

There are no problems ktrace-ing 33012, but nothing comes from there, as 
that process simply waits for its child. I guess, the child -- 37126 was 
(v)forked to launch a compiler or some such and remains stuck in between 
(v)fork and exec somewhere...

The OS is: FreeBSD 7.0-STABLE/amd64 from Sat Jul 26, 2008 and the box is 
otherwise perfectly functional. The scheduling-related options are set 
as such:

    options         SCHED_4BSD              # 4BSD scheduler
    options         _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B
    real-time extensions

Let me know, what else I can do to help fix this bug -- I'm going to 
reboot the machine tonight... Should I switch to SCHED_ULE as a 
work-around? Thanks! Yours,

    -mi



More information about the freebsd-stable mailing list