fatal trap 12

Tue Sep 14 10:00:49 PDT 2004

On Tue, 14 Sep 2004, Volker wrote:

> After the reboot, the system is panicing 3 to 8 times a day. To see the
> panic messages, I've set the PANIC_REBOOT_WAIT_TIME to -1 and this let
> me see a message like (not copied and pasted): 

If I might suggest, and if possible, you might want to set up a serial
console for the box so that you can copy and paste debugger output. 
You'll probably be asked for quite a bit of output from the debugger and
life is a lot easier if you can do that :-).  It also reduces the chances
of typographical errors.

> fatal trap 12: page fault
> fault virtual address: 0xc
> fault code: supervisor read, page not present
> instr. ptr: 0x8:0xc0586e60
> stack ptr: 0x10:0xcee2cac8
> frame ptr: 0x10:0xcee2caf0
> cs: base 0x0 limit 0xffff type 0x1b DPL 0 pres 1 def32 1 gran 1
> cpu eflags: interrupt enabled, resume, IOPL=0
> process: 33767 imapd
> trap 12

This is a kernel NULL pointer dereference.  To debug this, it would be
helpful if you could determine what line in the kernel source code
0xc0586e60 refers to.  addr2line on the kernel.debug from your kernel
build is a good place to start.  It would also be very helpful to have a
stack trace.  When you drop to DDB due to the panic (assuming DDB is
compiled in), you can type in "trace" to generate the trace.  Having the
names of the functions plus offsets would be very helpful.  Also having
the arguments is good, but a lot more pain for you without a serial
console :-).

> While trying to get the system stable, I've tried a 6-current Kernel
> (+world) but the system still panics (only the current process and the
> pointer addresses are changing, the system mostly panics with a trap
> 12). 
> 
> Another time the system panic'ed with: 'panic: sbappendaddr_locked'

A stack trace here would be invaluable.  This panic occurs as a result of
a violation of calling convention, in which a non-header mbuf (or maybe a
free'd mbuf) is appended to a socket incorrectly.  A stack trace will tell
as what calling code might be at fauilt.

> On 2004-09-13 I've cvsup'ed current and releng_5 sources and recompiled 
> (releng_5) world + kernel. The system kept panicing.
> 
> Well, since having boot problems using that mainboard (Slot-1, P-III 
> 600, FIC VB-601V, which caused the BTX loader sometimes to a fatal 
> exit... strange thing), I've plugged in another board which has been 
> working stable over the last few weeks (Epox 51-MVP3G with AMD K6-2 500).
> 
> This system is now up using that socket-7 board but has paniced a few 
> minutes ago the second time:
> 
> fatal trap 12: page fault
> fatal virtual address: 0x40
> trap 12: page fault while in kernel mode
> ip: 0x8:0xc05488ed
> sp: 0x10:0xca3f4c20
> fp: 0x10:0xca3f4c20
> process: 34 (swi6: task queue)
> 
> A few minutes before it paniced with:
> 
> in_cksum_skip: out of data by 184

A couple of bugs relating to this error were introduced and then fixed.
In particular, could you confirm that you have at least revision 1.165 of
udp_usrreq.c, or 1.162.2.2 of udp_usrreq.c?  The merge to RELENG_5
happened on 8/30 so you should have it, but it's worth confirming.

A stack trace here would also be extremely helpful, but this failure could
be explained by whatever causes the sbappendaddr_locked failure as well.

> Any additional tests you want me to drive? 

Could you try booting and running the system with debug.mpsafenet=0 in
loader.conf?  Is this an SMP box?  Could you try compiling and running
without the PREEMPTION kernel option?  Probably the most valuable
information would be the stack traces as indicated above, however.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org      Principal Research Scientist, McAfee Research