5.3-RELEASE TODO - make/kqueue

Garance A Drosihn drosih at rpi.edu
Sat Aug 28 17:05:10 PDT 2004

At 7:37 AM -0600 8/27/04, Scott Long wrote:
>Testing focuses for 5.3-RELEASE

And update on Issue:

>  |---------------------------------+
>  | make -DUSE_KQUEUE causes lockup |
>  | with buildworld -jBIGNUM        |
>  |---------------------------------+

The description says:

>  |-------------------+---------------+--------------+------------|
>  |  Attempts to use make(1) with KQueues appears to result in a  |
>  |  kernel hang under "heavy load". It would be desirable to fix |
>  |  this both from the perspective of building FreeBSD quickly   |
>  |  as a developer, but also because it's an instability that    |
>  |  could show up under other high load and heavy use of         |
>  |  KQueues. See PR kern/57945 for a proposed patch and details. |
>  |  This appear to be the product of a locking problem, and must |
>  |  be fixed for 5.3.                                            |
>  |-------------------+---------------+--------------+------------|

I have done many buildworlds using the WITH_KQUEUE make over the
past week.  I have done at least 50 buildworlds in my dual-proc
Althon machine, with -j ranging from 3 to 15.  I have not seen any
lockups since the fix for IPI deadlocks went in.

I do still get the "*** Signal 6"s, even though I am now running
with v1.76 of src/sys/kern/kern_lock.c.  Actually I had updated
that one source file, expecting to get revision 1.75 (and thus
backout revision 1.74), as recently mentioned by Doug White.  I
just now realized that I ended up with 1.76...  I guess I should
try it one more time with 1.75 instead of 1.76.

One observation which is perhaps interesting.  I also modified
sys/kern/kern_sig.c so that it prints out a message to the console
whenever kill() or killpg1() is called with a SIGABRT.  I tested
that change, and it seems to work correctly with programs caling
kill(SIGABRT), abort(), or raise(SIGABORT).  However, when my
buildworld dies with `make' claiming it saw a Signal 6, these
printf's in kern_sig.c are never triggered.

This failure is "eventually repeatable" for me, in that I can
trigger it within 10 buildworlds.  And *seems* that it only
happens if I am also running a "folding-at-home" client at the
same time.  That client program is a Linux ELF binary, so maybe
that is significant.   Or maybe it's a red herring.

Garance Alistair Drosehn            =   gad at gilead.netel.rpi.edu
Senior Systems Programmer           or  gad at freebsd.org
Rensselaer Polytechnic Institute    or  drosih at rpi.edu

More information about the freebsd-current mailing list