6.1-STABLE; Fatal trap 12: page fault while in kernel mode; kgdb isn't working??!?

David Wolfskill david at catwhisker.org
Wed May 31 17:31:15 PDT 2006


In testing a vendor's product, I managed (as I had been warned might
happen) to crash the machine on which the product was running.

It's a moderately-recent 6.1-STABLE:

mx-out05# uname -a
FreeBSD mx-out05.lab.example.org 6.1-STABLE FreeBSD 6.1-STABLE #3: Sun May  7 10:06:44 PDT 2006     dhw at mx-out05.lab.example.org:/usr/obj/usr/src/sys/SMP_PAE  i386
mx-out05# 

Hardware-wise, it's a dual 3 GHz Xeon box with 4 GB RAM.

In case it's relevant:

mx-out05# mount; df; swapinfo
/dev/aacd0s2a on / (ufs, local, soft-updates)
devfs on /dev (devfs, local)
/dev/aacd0s2d on /usr (ufs, local, soft-updates)
/dev/aacd0s3d on /home (ufs, local, soft-updates)
/dev/aacd0s3e on /var (ufs, local, soft-updates)
/dev/aacd1s1d on /var/spool (ufs, local, noatime)
devfs on /var/named/dev (devfs, local)
/dev/md0 on /tmp (ufs, local, soft-updates)
Filesystem    1K-blocks    Used    Avail Capacity  Mounted on
/dev/aacd0s2a    507630   37008   430012     8%    /
devfs                 1       1        0   100%    /dev
/dev/aacd0s2d   2280880 1676226   422184    80%    /usr
/dev/aacd0s3d   5077038   50950  4619926     1%    /home
/dev/aacd0s3e   7270492  949650  5739204    14%    /var
/dev/aacd1s1d  34678048   14136 31889670     0%    /var/spool
devfs                 1       1        0   100%    /var/named/dev
/dev/md0        9159102      16  8426358     0%    /tmp
Device          1K-blocks     Used    Avail Capacity
/dev/aacd0s3b    16777216        0 16777216     0%
mx-out05# 

Yes, swap is ridiculously huge (but note that /tmp is swap-backed).
So are a few other allocations (huge, that is); in general, I prefer
to avoid exhausting resources.  :-}

The crash appears to be quite reproducible by using
ports/benchmarks/postal.  It's fairly likely that I need to configure
some resource-consumption constraints so the application doesn't go
completely berserk.  I note that running postal using the same
parameters against a similar box running Postfix just chugs along, no
problem at all.

Here's a typical complaint as extracted from /var/log/messages:

May 31 16:02:13 mx-out05 kernel: Fatal trap 12: page fault while in kernel mode
May 31 16:02:13 mx-out05 kernel: cpuid = 0; apic id = 00
May 31 16:02:13 mx-out05 kernel: fault virtual address  
May 31 16:02:13 mx-out05 kernel: = 0x0
May 31 16:02:13 mx-out05 kernel: fault code             = supervisor read, page not present
May 31 16:02:13 mx-out05 kernel: instruction pointer    = 0x20:0x0
May 31 16:02:13 mx-out05 kernel: stack pointer          = 0x28:0xf06f8b98
May 31 16:02:13 mx-out05 kernel: frame pointer          = 0x28:0xf06f8bcc
May 31 16:02:13 mx-out05 kernel: code segment           = base 0x0, limit 0xf
May 31 16:02:13 mx-out05 kernel: f


I did manage to set things up to get a kernel crash dump, and I'm about
as certain as I can be that the kernel, userland, and crash dump are all
in sync.

Still, when I

cd /usr/obj/usr/src/sys/SMP_PAE/ && kgdb kernel.debug /var/crash/vmcore.0

I get a repeating:
kgdb: kvm_read: invalid address (0xc9ff5624)
kgdb: kvm_read: invalid address (0xc9ff8600)
kgdb: kvm_read: invalid address (0xc9ff5624)
kgdb: kvm_read: invalid address (0xc9ff8600)

The pattern repeats until I interrupt it.

Now, this box is in a lab; it is for testing (at this time), so I have
rather more flexibility than I might for a production system.  The
product was built for FreeBSD 5.x; I have the ports/misc/compat-5x port
installed, and the product does run -- at least, until I start
stress-testing it.  :-}

I could bring the box up to a more recent -STABLE fairly easily; for that
matter, I could probably bring it up to -CURRENT fairly easily, but I
have no intent to be running a production service on -CURRENT.  (My
laptop?  Sometimes.  A production box in a colo?  Uhh... maybe I'm just
not sufficiently daring, but no thanks.  :-})

I'd appreciate suggestions (or pointers to same) as to how I might
proceed to determine what I can do to get the product to run reliably
iin a FreeBSD environment.  (The vendor has suggested eithe rRed Hat or
Suse Linux as more stable platforms, and has complained about an
inability to get debugging information from FreeBSD.  I have pointe dout
that there's been some progress of late on getting DTrace ported to
FreeBSD, and they've seemed at least somewhat interested, but in the
mean time....)

Anyway, I'll plan on summarizing off-list responses that are relevant.

Thanks!

Peace,
david
-- 
David H. Wolfskill				david at catwhisker.org
Doing business with spammers only encourages them.  Please boycott spammers.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060601/4a8c69af/attachment.pgp


More information about the freebsd-stable mailing list