kernel crash coredump help

Catalin Miclaus catalin at starcomms.com
Sun Jun 8 20:02:46 UTC 2008


Hi,

We install a new server Dell 2950 with FreeBSD 7.0 and we've got some
issues with same.

Hardware: Dell 2950 Intel(R) Xeon(R) Dual CPU Quad-Core E5335  @ 2.00GHz
(1995.01-MHz K8-class CPU) with 4079 MB RAM and 2 x 250GB SATA HDD.

Normal server install using developer, all sources without games.
Upgrade to 7.0-p1#, then recompile kernel with

GENERIC +

device pf
device pfsync
device pflog
device carp

options         HZ=1000
options         DEVICE_POLLING

Server is running as secondary PF firewall with CARP/PFSYNC/IFSTATED.
Additional services running on the server are bind, net-snmp and ssh.
We have additional 7 servers running similar services with 6.2 and 7.0
FreeBSD all running fine.

Later same day the server crashed.
The traffic was on MASTER CARP server when crash happen, server was not
under load, CPU was 0% and memory 10% from NMS reports. 
We were able to got a crash dump:

[root at fw2 FW]# kgdb kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
<7>arp_rtrequest: bad gateway 196.3.61.14 (!AF_LINK)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xda040020
fault code              = supervisor read data, page not present
instruction pointer     = 0x8:0xffffffff80666070
stack pointer           = 0x10:0xffffffffac3e0650
frame pointer           = 0x10:0xffffff00cfb42820
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 19 (swi1: net)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 2h10m11s
Physical memory: 4079 MB
Dumping 425 MB: 410 394 378 362 346 330 314 298 282 266 250 234 218 202
186 170 154 138 122 106 90 74 58 42 26 10

#0  doadump () at pcpu.h:194
194             __asm __volatile("movq %%gs:0,%0" : "=r" (td));


 (kgdb) list *0xffffffff80666070
0xffffffff80666070 is in uma_zfree_internal (uma_int.h:368).
363             int hval;
364
365             hval = UMA_HASH(hash, data);
366
367             SLIST_FOREACH(slab, &hash->uh_slab_hash[hval], us_hlink)
{
368                     if ((u_int8_t *)slab->us_data == data)
369                             return (slab);
370             }
371             return (NULL);
372     }


(kgdb) backtrace
#0  doadump () at pcpu.h:194
#1  0x0000000000000004 in ?? ()
#2  0xffffffff80497ea9 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:409
#3  0xffffffff804982ad in panic (fmt=0x104 <Address 0x104 out of
bounds>) at /usr/src/sys/kern/kern_shutdown.c:563
#4  0xffffffff8071ad64 in trap_fatal (frame=0xffffff00010e0340,
eva=18446742974215697512) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0xffffffff8071b135 in trap_pfault (frame=0xffffffffac3e05a0,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641
#6  0xffffffff8071ba78 in trap (frame=0xffffffffac3e05a0) at
/usr/src/sys/amd64/amd64/trap.c:410
#7  0xffffffff807016de in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:169
#8  0xffffffff80666070 in uma_zfree_internal (zone=0xffffff00cfb42820,
item=0xffffff0003b2e000, udata=0x0, skip=Variable "skip" is not
available.) at uma_int.h:367
#9  0xffffffff8066909b in uma_zfree_arg (zone=0xffffff00cfb42820,
item=0xffffff0003b2e000, udata=0x0) at /usr/src/sys/vm/uma_core.c:2405
#10 0xffffffff80665fe4 in uma_zfree_internal (zone=0xffffff00cfb429c0,
item=0xffffff0003a86600, udata=0x0, skip=Variable "skip" is not
available.) at /usr/src/sys/vm/uma_core.c:2434
#11 0xffffffff80666bba in bucket_drain (zone=0xffffff00cfb429c0,
bucket=0xffffff0003a94830) at /usr/src/sys/vm/uma_core.c:595
#12 0xffffffff80666cab in bucket_cache_drain (zone=0xffffff00cfb429c0)
at /usr/src/sys/vm/uma_core.c:662
#13 0xffffffff8066996b in zone_drain (zone=0xffffff00cfb429c0) at
/usr/src/sys/vm/uma_core.c:710
#14 0xffffffff801b7f95 in pfsync_get_mbuf (sc=0xffffff0003573400,
action=2 '\002', sp=0xffffff0003573570) at mbuf.h:529
#15 0xffffffff801b8208 in pfsync_pack_state (action=Variable "action" is
not available.) at /usr/src/sys/contrib/pf/net/if_pfsync.c:1512
#16 0xffffffff801ce863 in pf_test (dir=1, ifp=0xffffff000128d800,
m0=0xffffffffac3e0a00, eh=Variable "eh" is not available.) at
/usr/src/sys/contrib/pf/net/pf.c:6955
#17 0xffffffff801d360a in pf_check_in (arg=Variable "arg" is not
available.) at /usr/src/sys/contrib/pf/net/pf_ioctl.c:3533
#18 0xffffffff80539561 in pfil_run_hooks (ph=Variable "ph" is not
available.) at /usr/src/sys/net/pfil.c:78
#19 0xffffffff80574e2b in ip_input (m=0xffffff0036190500) at
/usr/src/sys/netinet/ip_input.c:417
#20 0xffffffff8052dee1 in ether_demux (ifp=0xffffff000128d800,
m=0xffffff0036190500) at /usr/src/sys/net/if_ethersubr.c:834
#21 0xffffffff8052e181 in ether_input (ifp=0xffffff000128d800,
m=0xffffff0036190500) at /usr/src/sys/net/if_ethersubr.c:692
#22 0xffffffff802d77ac in em_rxeof (adapter=0xffffff000122f000,
count=119) at /usr/src/sys/dev/em/if_em.c:4542
#23 0xffffffff802d84d7 in em_poll (ifp=0xffffff000128d800, cmd=Variable
"cmd" is not available.) at /usr/src/sys/dev/em/if_em.c:1433
#24 0xffffffff8048dd8d in netisr_poll () at
/usr/src/sys/kern/kern_poll.c:432
#25 0xffffffff80537e8a in swi_net (dummy=Variable "dummy" is not
available.) at /usr/src/sys/net/netisr.c:254
#26 0xffffffff8047b5a0 in ithread_loop (arg=0xffffff00010d9b80) at
/usr/src/sys/kern/kern_intr.c:1036
#27 0xffffffff80478673 in fork_exit (callout=0xffffffff8047b430
<ithread_loop>, arg=0xffffff00010d9b80, frame=0xffffffffac3e0c80) at
/usr/src/sys/kern/kern_fork.c:781
#28 0xffffffff80701aae in fork_trampoline () at
/usr/src/sys/amd64/amd64/exception.S:415
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000001 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0x0000000000000000 in ?? ()
#48 0x0000000000000000 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0x0000000000000000 in ?? ()
#51 0x0000000000000000 in ?? ()
#52 0x0000000000000000 in ?? ()
#53 0x0000000000c9c000 in ?? ()
#54 0xffffff00010e0340 in ?? ()
#55 0x0000000000000001 in ?? ()
#56 0xffffff00010f3468 in ?? ()
#57 0xffffff00010e0680 in ?? ()
#58 0xffffff00010e0340 in ?? ()
#59 0xffffffffac3e0b58 in ?? ()
#60 0xffffff00010e0340 in ?? ()
#61 0xffffffff804b5b69 in sched_switch (td=0xffffff00010d9b80,
newtd=0xffffffff8047b430, flags=0) at /usr/src/sys/kern/sched_4bsd.c:905
#62 0x0000000000000000 in ?? ()
#63 0x0000000000000000 in ?? ()
#64 0x0000000000000000 in ?? ()
#65 0x0000000000000000 in ?? ()
#66 0x0000000000000000 in ?? ()
#67 0x0000000000000000 in ?? ()
#68 0x0000000000000000 in ?? ()
#69 0x0000000000000000 in ?? ()
#70 0x0000000000000000 in ?? ()
#71 0x0000000000000000 in ?? ()
#72 0x0000000000000000 in ?? ()
#73 0x0000000000000000 in ?? ()
#74 0x0000000000000000 in ?? ()
#75 0x0000000000000000 in ?? ()
#76 0x0000000000000000 in ?? ()
#77 0x0000000000000000 in ?? ()
#78 0x0000000000000000 in ?? ()
#79 0x0000000000000000 in ?? ()
#80 0x0000000000000000 in ?? ()
#81 0x0000000000000000 in ?? ()
#82 0x0000000000000000 in ?? ()
#83 0x0000000000000000 in ?? ()
#84 0x0000000000000000 in ?? ()
#85 0x0000000000000000 in ?? ()
#86 0x0000000000000000 in ?? ()
#87 0x0000000000000000 in ?? ()
#88 0x0000000000000000 in ?? ()
#89 0x0000000000000000 in ?? ()
#90 0x0000000000000000 in ?? ()
#91 0x0000000000000000 in ?? ()
#92 0x0000000000000000 in ?? ()
#93 0x0000000000000000 in ?? ()
#94 0x0000000000000000 in ?? ()
#95 0x0000000000000000 in ?? ()
#96 0x0000000000000000 in ?? ()
#97 0x0000000000000000 in ?? ()
#98 0x0000000000000000 in ?? ()
#99 0x0000000000000000 in ?? ()
#100 0x0000000000000000 in ?? ()
#101 0x0000000000000000 in ?? ()
#102 0x0000000000000000 in ?? ()
#103 0x0000000000000000 in ?? ()
#104 0x0000000000000000 in ?? ()
#105 0x0000000000000000 in ?? ()
#106 0x0000000000000000 in ?? ()
#107 0x0000000000000000 in ?? ()
#108 0x0000000000000000 in ?? ()
#109 0x0000000000000000 in ?? ()
#110 0x0000000000000000 in ?? ()
#111 0x0000000000000000 in ?? ()
#112 0x0000000000000000 in ?? ()
#113 0x0000000000000000 in ?? ()
#114 0x0000000000000000 in ?? ()
#115 0x0000000000000000 in ?? ()
#116 0x0000000000000000 in ?? ()
#117 0x0000000000000000 in ?? ()
#118 0x0000000000000000 in ?? ()
#119 0x0000000000000000 in ?? ()
#120 0x0000000000000000 in ?? ()
#121 0x0000000000000000 in ?? ()
#122 0x0000000000000000 in ?? ()
#123 0x0000000000000000 in ?? ()
#124 0x0000000000000000 in ?? ()
#125 0x0000000000000000 in ?? ()
#126 0x0000000000000000 in ?? ()
#127 0x0000000000000000 in ?? ()
#128 0x0000000000000000 in ?? ()
#129 0x0000000000000000 in ?? ()
#130 0x0000000000000000 in ?? ()
#131 0x0000000000000000 in ?? ()
#132 0x0000000000000000 in ?? ()
#133 0x0000000000000000 in ?? ()
Cannot access memory at address 0xffffffffac3e1000
(kgdb)

Appreciate your help in identifying if this is a hardware failure or we
just step on a bug.







Best Regards
Catalin Miclaus
Network/Security ISP-Data
Starcomms Ltd.



More information about the freebsd-questions mailing list