Need help with crash analysis

Sun Dec 18 13:41:20 PST 2005

I had another crash and this time ran kgdb and typed "bt full" with the
following output.  As a last resort I rebuilt the kernel with HZ=2000,
instead of 1000 and haven't had a crash since.  My wireless card seems
more responsive under load too.  Ping times are lower when I'm
transferring large files across the network.

[GDB will not be able to debug user-mode
threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x10
fault code		= supervisor read, page not present
instruction pointer	= 0x20:0xc066ccec
stack pointer	        = 0x28:0xe36198bc
frame pointer	        = 0x28:0x0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 38 (swi1: net)
trap number		= 12
panic: page fault
cpuid = 0
Uptime: 3m17s
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (158 pages)

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x1c
fault code		= supervisor write, page not present
instruction pointer	= 0x20:0xc0549f20
stack pointer	        = 0x28:0xe5084c8c
frame pointer	        = 0x28:0xe5084ccc
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 36 (swi4: clock)
trap number		= 12
 ... ok
  chunk 1: 1023MB (261802 pages) 1007 991 975 959 943 927 911 895 879
863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591
575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt full
#0  doadump () at pcpu.h:165
No locals.
#1  0xc053b467 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:399
	first_buf_printf = 1
#2  0xc053b818 in panic (fmt=0xc06dc865 "%s")
at /usr/src/sys/kern/kern_shutdown.c:555
	td = (struct thread *) 0xc22ec4b0
	bootopt = 260
	newpanic = 0
	ap = 0xc22ec4b0 ""
	buf = "page fault", '\0' <repeats 245 times>
#3  0xc06b4b14 in trap_fatal (frame=0xe361987c, eva=0)
at /usr/src/sys/i386/i386/trap.c:831
	code = 40
	type = 12
	ss = 40
	esp = 0
	softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl =
0, ssd_p = 1, ssd_xx = 0, ssd_xx1 = 0, ssd_def32 = 1, ssd_gran = 1}
#4  0xc06b480d in trap_pfault (frame=0xe361987c, usermode=0, eva=16)
at /usr/src/sys/i386/i386/trap.c:742
	va = 0
	vm = (struct vmspace *) 0x0
	map = 0xc073d820
	rv = 1
	ftype = 1 '\001'
	td = (struct thread *) 0xc22ec4b0
	p = (struct proc *) 0xc234b000
#5  0xc06b43f3 in trap (frame=
      {tf_fs = -480182264, tf_es = 40, tf_ds = 40, tf_edi = 0, tf_esi =
-315641204, tf_ebp = 0, tf_isp = -480143192, tf_ebx = -315638608, tf_edx
= 791735, tf_ecx = -1073475471, tf_eax = 1, tf_trapno = 12, tf_err = 0,
tf_eip = -1067004692, tf_cs = 32, tf_eflags = 66050, tf_esp = 16777216,
tf_ss = 0})
    at /usr/src/sys/i386/i386/trap.c:432
	td = (struct thread *) 0xc22ec4b0
	p = (struct proc *) 0xc234b000
	sticks = 3814824188
	i = 0
	ucode = 0
	type = 12
	code = 0
	eva = 16
#6  0xc06a041a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
No locals.
#7  0xc066ccec in zz0e373a4d ()
No symbol table info available.
(kgdb) quit

On Fri, 2005-12-16 at 18:17 -0500, Peter D. Quilty wrote:

> I have a Dell Inspiron 9100 laptop that has been crashing lately.  It
> seems to happen when there is a moderate disk load and the network load
> is > 6 Mbits/sec.  I can usually replicate it by running "portsdb -fUu"
> while downloading or copying large files across the network.  I have
> tried the following in an attempt to isolate the problem, but nothing
> has worked.
>       * disabling ACPI
>       * disabling hyperthreading
>       * disabling SMP
>       * switching back to the 4BSD scheduler from ULE
> I ran kgdb against kernel.debug and the crash dump, but don't quite know
> how to interpret it or where to go from here.  I've attached my kernel
> config file, dmesg.boot, and the outputs from kldstat and kgdb.
> 
> I recently upgraded my router/access point at home from 802.11b to
> 802.11g to take advantage of the faster network cards in my laptops and
> I am wondering if that could be exposing a bug or race condition.  I
> tried putting my network card back in 11b mode (instead of 11g) and I
> don't see the problem nearly as often.
> 
> Does anyone have any suggestions as to how to troubleshoot this further?
> I have saved the relevant kernel files and crash dumps, in case I need
> to reference them again.
> 
>