6.1-STABLE; Fatal trap 12: page fault while in kernel mode; kgdb
isn't working??!?
Scott Long
scottl at samsco.org
Wed May 31 20:10:19 PDT 2006
David Wolfskill wrote:
> In testing a vendor's product, I managed (as I had been warned might
> happen) to crash the machine on which the product was running.
>
> It's a moderately-recent 6.1-STABLE:
>
> mx-out05# uname -a
> FreeBSD mx-out05.lab.example.org 6.1-STABLE FreeBSD 6.1-STABLE #3: Sun May 7 10:06:44 PDT 2006 dhw at mx-out05.lab.example.org:/usr/obj/usr/src/sys/SMP_PAE i386
> mx-out05#
>
> Hardware-wise, it's a dual 3 GHz Xeon box with 4 GB RAM.
>
> In case it's relevant:
>
> mx-out05# mount; df; swapinfo
> /dev/aacd0s2a on / (ufs, local, soft-updates)
> devfs on /dev (devfs, local)
> /dev/aacd0s2d on /usr (ufs, local, soft-updates)
> /dev/aacd0s3d on /home (ufs, local, soft-updates)
> /dev/aacd0s3e on /var (ufs, local, soft-updates)
> /dev/aacd1s1d on /var/spool (ufs, local, noatime)
> devfs on /var/named/dev (devfs, local)
> /dev/md0 on /tmp (ufs, local, soft-updates)
> Filesystem 1K-blocks Used Avail Capacity Mounted on
> /dev/aacd0s2a 507630 37008 430012 8% /
> devfs 1 1 0 100% /dev
> /dev/aacd0s2d 2280880 1676226 422184 80% /usr
> /dev/aacd0s3d 5077038 50950 4619926 1% /home
> /dev/aacd0s3e 7270492 949650 5739204 14% /var
> /dev/aacd1s1d 34678048 14136 31889670 0% /var/spool
> devfs 1 1 0 100% /var/named/dev
> /dev/md0 9159102 16 8426358 0% /tmp
> Device 1K-blocks Used Avail Capacity
> /dev/aacd0s3b 16777216 0 16777216 0%
> mx-out05#
>
> Yes, swap is ridiculously huge (but note that /tmp is swap-backed).
> So are a few other allocations (huge, that is); in general, I prefer
> to avoid exhausting resources. :-}
>
> The crash appears to be quite reproducible by using
> ports/benchmarks/postal. It's fairly likely that I need to configure
> some resource-consumption constraints so the application doesn't go
> completely berserk. I note that running postal using the same
> parameters against a similar box running Postfix just chugs along, no
> problem at all.
>
> Here's a typical complaint as extracted from /var/log/messages:
>
> May 31 16:02:13 mx-out05 kernel: Fatal trap 12: page fault while in kernel mode
> May 31 16:02:13 mx-out05 kernel: cpuid = 0; apic id = 00
> May 31 16:02:13 mx-out05 kernel: fault virtual address
> May 31 16:02:13 mx-out05 kernel: = 0x0
> May 31 16:02:13 mx-out05 kernel: fault code = supervisor read, page not present
> May 31 16:02:13 mx-out05 kernel: instruction pointer = 0x20:0x0
> May 31 16:02:13 mx-out05 kernel: stack pointer = 0x28:0xf06f8b98
> May 31 16:02:13 mx-out05 kernel: frame pointer = 0x28:0xf06f8bcc
> May 31 16:02:13 mx-out05 kernel: code segment = base 0x0, limit 0xf
> May 31 16:02:13 mx-out05 kernel: f
>
>
> I did manage to set things up to get a kernel crash dump, and I'm about
> as certain as I can be that the kernel, userland, and crash dump are all
> in sync.
>
> Still, when I
>
> cd /usr/obj/usr/src/sys/SMP_PAE/ && kgdb kernel.debug /var/crash/vmcore.0
>
> I get a repeating:
> kgdb: kvm_read: invalid address (0xc9ff5624)
> kgdb: kvm_read: invalid address (0xc9ff8600)
> kgdb: kvm_read: invalid address (0xc9ff5624)
> kgdb: kvm_read: invalid address (0xc9ff8600)
>
> The pattern repeats until I interrupt it.
>
> Now, this box is in a lab; it is for testing (at this time), so I have
> rather more flexibility than I might for a production system. The
> product was built for FreeBSD 5.x; I have the ports/misc/compat-5x port
> installed, and the product does run -- at least, until I start
> stress-testing it. :-}
>
> I could bring the box up to a more recent -STABLE fairly easily; for that
> matter, I could probably bring it up to -CURRENT fairly easily, but I
> have no intent to be running a production service on -CURRENT. (My
> laptop? Sometimes. A production box in a colo? Uhh... maybe I'm just
> not sufficiently daring, but no thanks. :-})
>
> I'd appreciate suggestions (or pointers to same) as to how I might
> proceed to determine what I can do to get the product to run reliably
> iin a FreeBSD environment. (The vendor has suggested eithe rRed Hat or
> Suse Linux as more stable platforms, and has complained about an
> inability to get debugging information from FreeBSD. I have pointe dout
> that there's been some progress of late on getting DTrace ported to
> FreeBSD, and they've seemed at least somewhat interested, but in the
> mean time....)
>
> Anyway, I'll plan on summarizing off-list responses that are relevant.
>
> Thanks!
>
> Peace,
> david
kgdb seems to be more broken than not. COuld you enable KDB+DDB and at
least get a stack trace from the fault?
Scott
More information about the freebsd-stable
mailing list