RELENG_5 PAE panic

Fri Jul 29 00:34:34 GMT 2005

Intel SE7320VP2 motherboard with single Xeon 2.8GHz, 4GB RAM, and an
80GB disk and a CD-ROM drive connected to motherboard ATA.  1GB of the
RAM appears above 4GB which suggested building a PAE kernel.  So,
imagine 5.4-RELEASE with a kernel config file that goes like this:

include PAE
options MAXDSIZ="(2000UL*1024*1024)"
options IPFIREWALL
options IPFIREWALL_DEFAULT_TO_ACCEPT
options IPDIVERT
options DUMMYNET

It boots, but often panics while starting a rather modified named
(based on BIND 8, and running just fine on other 4.x and 5.x systems
with similar kernel configuration but no PAE).  Sometimes it doesn't
panic right away, but it usually does.

Boot from /boot/kernel.old/kernel (no PAE), add KDB/DDB options to
PAE kernel configuration, build and install kernel and try some more.
See that it is panicking in propagate_priority().  No crash dumps, it
reliably dumps 3552MB and then loses with an NMI.

s/PAE/GENERIC/ and it runs, but ignores 1GB RAM.

That was last night.  This morning I found
<http://lists.freebsd.org/pipermail/freebsd-stable/2005-April/013718.html>
which could be describing a related problem (though I have no ips-type
hardware in my picture), and Scott Long seemed to be interested.  

And I looked through the commit logs and saw a commit to
sys/kern/kern_switch.c that looked like it could perhaps have some
bearing.  (Rev 1.112, MFCd as 1.78.2.19, basing this on the commit
message for 1.112.)

So, hmm.  cvsup using stable-supfile, buildworld, buildkernel, &c.

It didn't help.  A kernel with PAE still got me a panic during named
startup (this time by hand after having logged in as root following
multi-user startup).

--- begin paste ---
Fatal trap 12: page fault while in kernel mode
fault virtual address  = 0x24
fault code             = supervisor read, page not present
instruction pointer    = 0x8:0xc03db1cf
stack pointer          = 0x10:0xeb328c64
frame pointer          = 0x10:0xeb328c78
code segment           = base 0x0, limit 0xfffff, type 0x1b
                       = DPL 0, pres 1, def32 1, gran 1
processor eflags       = resume, IOPL = 0
current process        = 70 (pagedaemon)
[thread pid 70 tid 100080 ]
Stopped at      0xc03db1cf = propagate_priority+0x7f:  movl    0x24(%eax),%eax
db> trace
Tracing pid 70 tid 100080 td 0xc6a89000
propagate_priority(c6a89000,c0628280,c0636c60,c6a89000,c6cdaa82) at 0xc03db1cf = propagate_priority+0x7f
turnstile_wait(c6a6f240,c0636c60,c6cdaa80) at 0xc03db84a = turnstile_wait + 0x266
_mtx_lock_sleep(c0636c60,c6a89000,0,0,0) at 0xc03b4c25 = _mtx_lock_speed+0xad
msleep(c0637104,c0636c60,44,c059aa74,1f4) at 0xc03c37ea = msleep+0x39a
vm_pageout(0,eb328d38) at 0xc04fb0e4 = vm_pageout+0x280
fork_exit(c04fae64,0,eb328d38) at 0xc03a8680 = fork_exit+0x74
fork_trampoline() at 0xc0539d9c = fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xeb328d6c, ebp = 0 ---
db> 
--- end paste ---

"panic" to force a dump continues to lose with an NMI after 3552MB.

Um, help?  Pretty please?  My clues about this part of the kernel are
a bit stale.  I don't know how long I have to play, and don't think I
can give remote access, but I'm willing to try stuff and be remote
eyes, hands, and as much of a brain as I can while I can.

-Frank McConnell