5.3-RELEASE TODO - make/kqueue
Garance A Drosihn
drosih at rpi.edu
Sat Aug 28 17:05:10 PDT 2004
At 7:37 AM -0600 8/27/04, Scott Long wrote:
>
>Testing focuses for 5.3-RELEASE
And update on Issue:
> |---------------------------------+
> | make -DUSE_KQUEUE causes lockup |
> | with buildworld -jBIGNUM |
> |---------------------------------+
The description says:
> |-------------------+---------------+--------------+------------|
> | Attempts to use make(1) with KQueues appears to result in a |
> | kernel hang under "heavy load". It would be desirable to fix |
> | this both from the perspective of building FreeBSD quickly |
> | as a developer, but also because it's an instability that |
> | could show up under other high load and heavy use of |
> | KQueues. See PR kern/57945 for a proposed patch and details. |
> | This appear to be the product of a locking problem, and must |
> | be fixed for 5.3. |
> |-------------------+---------------+--------------+------------|
I have done many buildworlds using the WITH_KQUEUE make over the
past week. I have done at least 50 buildworlds in my dual-proc
Althon machine, with -j ranging from 3 to 15. I have not seen any
lockups since the fix for IPI deadlocks went in.
I do still get the "*** Signal 6"s, even though I am now running
with v1.76 of src/sys/kern/kern_lock.c. Actually I had updated
that one source file, expecting to get revision 1.75 (and thus
backout revision 1.74), as recently mentioned by Doug White. I
just now realized that I ended up with 1.76... I guess I should
try it one more time with 1.75 instead of 1.76.
One observation which is perhaps interesting. I also modified
sys/kern/kern_sig.c so that it prints out a message to the console
whenever kill() or killpg1() is called with a SIGABRT. I tested
that change, and it seems to work correctly with programs caling
kill(SIGABRT), abort(), or raise(SIGABORT). However, when my
buildworld dies with `make' claiming it saw a Signal 6, these
printf's in kern_sig.c are never triggered.
This failure is "eventually repeatable" for me, in that I can
trigger it within 10 buildworlds. And *seems* that it only
happens if I am also running a "folding-at-home" client at the
same time. That client program is a Linux ELF binary, so maybe
that is significant. Or maybe it's a red herring.
--
Garance Alistair Drosehn = gad at gilead.netel.rpi.edu
Senior Systems Programmer or gad at freebsd.org
Rensselaer Polytechnic Institute or drosih at rpi.edu
More information about the freebsd-current
mailing list