Kernel Crashes

Chris Cowart ccowart at rescomp.berkeley.edu
Tue Feb 17 10:29:34 PST 2009


Hello,

We have a system that has been experiencing 3-5 crashes per day for
about 2 weeks now.

The affected machine started as a 7.0 virtual machine cloned from our
staging build. The production machine is running apache with mod_proxy
and mongrel_cluster for a ruby on rails webapp. Shortly after rolling
into production, the crashes began. 

All of the crashes have been:

fault code              = supervisor read, page not present

But there appears to be no rhyme or reason to the current process or
current syscall (to my untrained eyes, at least).

We also tried building a 7.1 vm from scratch, installing all the
software, and rolling it out last night. Today, it crashed too.

We are fairly certain this isn't a hardware problem, because we have
many other FreeBSD vms running successfully on this ESX server.

Here's today's 7.1 backtrace:

| root optimus crash # kgdb /boot/kernel/kernel vmcore.0 
| GNU gdb 6.1.1 [FreeBSD]
| Copyright 2004 Free Software Foundation, Inc.
| GDB is free software, covered by the GNU General Public License, and you are
| welcome to change it and/or distribute copies of it under certain conditions.
| Type "show copying" to see the conditions.
| There is absolutely no warranty for GDB.  Type "show warranty" for details.
| This GDB was configured as "i386-marcel-freebsd"...
| 
| Unread portion of the kernel message buffer:
| 
| 
| Fatal trap 12: page fault while in kernel mode
| cpuid = 0; apic id = 00
| fault virtual address   = 0xc526b305
| fault code              = supervisor read, page not present
| instruction pointer     = 0x20:0xc07e95b5
| stack pointer           = 0x28:0xcd08fb18
| frame pointer           = 0x28:0xcd08fb44
| code segment            = base 0x0, limit 0xfffff, type 0x1b
|                         = DPL 0, pres 1, def32 1, gran 1
| processor eflags        = interrupt enabled, resume, IOPL = 0
| current process         = 1158 (ruby18)
| trap number             = 12
| panic: page fault
| cpuid = 0
| Uptime: 7h47m5s
| Physical memory: 243 MB
| Dumping 70 MB: 55 39 23 7
| 
| Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done.
| done.
| Loaded symbols for /boot/kernel/acpi.ko
| Reading symbols from /boot/kernel/ipfw.ko...Reading symbols from /boot/kernel/ipfw.ko.symbols...done.
| done.
| Loaded symbols for /boot/kernel/ipfw.ko
| #0  doadump () at pcpu.h:196
| 196     pcpu.h: No such file or directory.
|         in pcpu.h
| (kgdb) where
| #0  doadump () at pcpu.h:196
| #1  0xc07998c7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
| #2  0xc0799b99 in panic (fmt=Variable "fmt" is not available.
| ) at /usr/src/sys/kern/kern_shutdown.c:574
| #3  0xc0ad6a9c in trap_fatal (frame=0xcd08fad8, eva=3307647749)
|     at /usr/src/sys/i386/i386/trap.c:939
| #4  0xc0ad6d20 in trap_pfault (frame=0xcd08fad8, usermode=0, eva=3307647749)
|     at /usr/src/sys/i386/i386/trap.c:852
| #5  0xc0ad76dc in trap (frame=0xcd08fad8) at /usr/src/sys/i386/i386/trap.c:530
| #6  0xc0abd54b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
| #7  0xc07e95b5 in m_uiotombuf (uio=0xcd08fbe8, how=2, len=2048, align=76, 
|     flags=2) at /usr/src/sys/kern/uipc_mbuf.c:1747
| #8  0xc07f16d5 in sosend_generic (so=0xc24d2680, addr=0x0, uio=0xcd08fbe8, 
|     top=0x0, control=0x0, flags=0, td=0xc23a6000)
|     at /usr/src/sys/kern/uipc_socket.c:1219
| #9  0xc07ed2ff in sosend (so=0xc24d2680, addr=0x0, uio=0xcd08fbe8, top=0x0, 
|     control=0x0, flags=0, td=0xc23a6000)
|     at /usr/src/sys/kern/uipc_socket.c:1288
| #10 0xc07f4606 in kern_sendit (td=0xc23a6000, s=5, mp=0xcd08fc64, flags=0, 
|     control=0x0, segflg=UIO_USERSPACE) at /usr/src/sys/kern/uipc_syscalls.c:805
| #11 0xc07f7851 in sendit (td=0xc23a6000, s=5, mp=0xcd08fc64, flags=0)
|     at /usr/src/sys/kern/uipc_syscalls.c:742
| #12 0xc07f7968 in sendto (td=0xc23a6000, uap=0xcd08fcfc)
|     at /usr/src/sys/kern/uipc_syscalls.c:857
| #13 0xc0ad7075 in syscall (frame=0xcd08fd38)
|     at /usr/src/sys/i386/i386/trap.c:1090
| #14 0xc0abd5b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
| #15 0x00000033 in ?? ()
| Previous frame inner to this frame (corrupt stack?)

And here's a previous 7.0 backtrace:

| ccowart hal optimus $ cat vmcore.1.bt 
| GNU gdb 6.1.1 [FreeBSD]
| Copyright 2004 Free Software Foundation, Inc.
| GDB is free software, covered by the GNU General Public License, and you are
| welcome to change it and/or distribute copies of it under certain conditions.
| Type "show copying" to see the conditions.
| There is absolutely no warranty for GDB.  Type "show warranty" for details.
| This GDB was configured as "i386-marcel-freebsd".
| 
| Unread portion of the kernel message buffer:
| 
| 
| Fatal trap 12: page fault while in kernel mode
| cpuid = 0; apic id = 00
| fault virtual address   = 0x640d6b5d
| fault code              = supervisor read, page not present
| instruction pointer     = 0x20:0xc07a9f7b
| stack pointer           = 0x28:0xcc793a10
| frame pointer           = 0x28:0xcc793a20
| code segment            = base 0x0, limit 0xfffff, type 0x1b
|                         = DPL 0, pres 1, def32 1, gran 1
| processor eflags        = interrupt enabled, resume, IOPL = 0
| current process         = 28 (irq18: le0)
| trap number             = 12
| panic: page fault
| cpuid = 0
| Uptime: 17h8m27s
| Physical memory: 243 MB
| Dumping 98 MB: 83 67 51 35 19 3
| 
| #0  doadump () at pcpu.h:195
|         in pcpu.h
| (kgdb) #0  doadump () at pcpu.h:195
| #1  0xc075cf37 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
| #2  0xc075d1f9 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:563
| #3  0xc0a7c6ac in trap_fatal (frame=0xcc7939d0, eva=1678601053)
|     at /usr/src/sys/i386/i386/trap.c:899
| #4  0xc0a7c930 in trap_pfault (frame=0xcc7939d0, usermode=0, eva=1678601053)
|     at /usr/src/sys/i386/i386/trap.c:812
| #5  0xc0a7d2dc in trap (frame=0xcc7939d0) at /usr/src/sys/i386/i386/trap.c:490
| #6  0xc0a6325b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
| #7  0xc07a9f7b in m_tag_delete_chain (m=0xc49da000, t=0x0)
|     at /usr/src/sys/kern/uipc_mbuf2.c:355
| #8  0xc074ef05 in mb_dtor_mbuf (mem=0xc49da000, size=256, arg=)
|     at /usr/src/sys/kern/kern_mbuf.c:409
| #9  0xc099a1ef in uma_zfree_arg (zone=0xc1466d20, item=0xc49da000, udata=0x0)
|     at /usr/src/sys/vm/uma_core.c:2255
| #10 0xc07ab683 in sbdrop_internal (sb=0xc26eda24, len=122) at uma.h:305
| #11 0xc07ab77e in sbdrop_locked (sb=0xc26eda24, len=122)
|     at /usr/src/sys/kern/uipc_sockbuf.c:899
| #12 0xc0899c9b in tcp_do_segment (m=0xc244c800, th=0xc36ba024, so=0xc26ed948, 
|     tp=0xc25051d0, drop_hdrlen=52, tlen=666)
|     at /usr/src/sys/netinet/tcp_input.c:2031
| #13 0xc089b501 in tcp_input (m=0xc244c800, off0=20)
|     at /usr/src/sys/netinet/tcp_input.c:845
| #14 0xc083cb59 in ip_input (m=0xc244c800)
|     at /usr/src/sys/netinet/ip_input.c:665
| #15 0xc07fc945 in netisr_dispatch (num=2, m=0xc244c800)
|     at /usr/src/sys/net/netisr.c:185
| #16 0xc07f2981 in ether_demux (ifp=0xc2227000, m=0xc244c800)
|     at /usr/src/sys/net/if_ethersubr.c:834
| #17 0xc07f2d73 in ether_input (ifp=0xc2227000, m=0xc244c800)
|     at /usr/src/sys/net/if_ethersubr.c:692
| #18 0xc05e520c in am79900_intr (arg=0xc2183000)
|     at /usr/src/sys/dev/le/am79900.c:340
| #19 0xc074015b in ithread_loop (arg=0xc22266f0)
|     at /usr/src/sys/kern/kern_intr.c:1036
| #20 0xc073cf59 in fork_exit (callout=0xc073ffb0 <ithread_loop>, 
|     arg=0xc22266f0, frame=0xcc793d38) at /usr/src/sys/kern/kern_fork.c:781
| #21 0xc0a632d0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205

And one more from 7.0, just for show:

| ccowart hal optimus $ cat vmcore.3.bt 
| GNU gdb 6.1.1 [FreeBSD]
| Copyright 2004 Free Software Foundation, Inc.
| GDB is free software, covered by the GNU General Public License, and you are
| welcome to change it and/or distribute copies of it under certain conditions.
| Type "show copying" to see the conditions.
| There is absolutely no warranty for GDB.  Type "show warranty" for details.
| This GDB was configured as "i386-marcel-freebsd".
| 
| Unread portion of the kernel message buffer:
| 
| 
| Fatal trap 12: page fault while in kernel mode
| cpuid = 0; apic id = 00
| fault virtual address   = 0xf98ca8a2
| fault code              = supervisor read, page not present
| instruction pointer     = 0x20:0xc07a82b4
| stack pointer           = 0x28:0xcd057248
| frame pointer           = 0x28:0xcd057274
| code segment            = base 0x0, limit 0xfffff, type 0x1b
|                         = DPL 0, pres 1, def32 1, gran 1
| processor eflags        = interrupt enabled, resume, IOPL = 0
| current process         = 1032 (syslog-ng)
| trap number             = 12
| panic: page fault
| cpuid = 0
| Uptime: 1d21h31m10s
| Physical memory: 243 MB
| Dumping 68 MB: 52 36 20 4
| 
| #0  doadump () at pcpu.h:195
|         in pcpu.h
| (kgdb) #0  doadump () at pcpu.h:195
| #1  0xc075cf37 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
| #2  0xc075d1f9 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:563
| #3  0xc0a7c6ac in trap_fatal (frame=0xcd057208, eva=4186744994)
|     at /usr/src/sys/i386/i386/trap.c:899
| #4  0xc0a7c930 in trap_pfault (frame=0xcd057208, usermode=0, eva=4186744994)
|     at /usr/src/sys/i386/i386/trap.c:812
| #5  0xc0a7d2dc in trap (frame=0xcd057208) at /usr/src/sys/i386/i386/trap.c:490
| #6  0xc0a6325b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
| #7  0xc07a82b4 in m_copym (m=0xf98ca88e, off0=1500, len=16, wait=1)
|     at mbuf.h:454
| #8  0xc083df55 in ip_fragment (ip=0xc2310434, m_frag=0xcd057344, mtu=1500, 
|     if_hwassist_flags=0, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:726
| #9  0xc083ec2e in ip_output (m=0xc2310400, opt=0x0, ro=0xcd057308, flags=2, 
|     imo=0x0, inp=0x0) at /usr/src/sys/netinet/ip_output.c:565
| #10 0xc08d711d in ipsec_process_done (m=0xc2310400, isr=0xc2435080)
|     at /usr/src/sys/netipsec/ipsec_output.c:177
| #11 0xc08e49a5 in ah_output_cb (crp=0xc2497528)
|     at /usr/src/sys/netipsec/xform_ah.c:1193
| #12 0xc0924330 in crypto_done (crp=0xc2497528)
|     at /usr/src/sys/opencrypto/crypto.c:1148
| #13 0xc092773e in swcr_process (dev=0xc21f5a00, crp=0xc2497528, hint=0)
|     at /usr/src/sys/opencrypto/cryptosoft.c:975
| #14 0xc0925376 in crypto_invoke (cap=0xc21f5a00, crp=0xc2497528, hint=0)
|     at cryptodev_if.h:53
| #15 0xc0925dbc in crypto_dispatch (crp=0xc2497528)
|     at /usr/src/sys/opencrypto/crypto.c:798
| #16 0xc08e504f in ah_output (m=0xc2310400, isr=0xc2435080, mp=0x0, skip=20, 
|     protoff=9) at /usr/src/sys/netipsec/xform_ah.c:1102
| #17 0xc08d743b in ipsec4_process_packet (m=0xc24d4a00, isr=0xc2435080, 
|     flags=0, tunalready=0) at /usr/src/sys/netipsec/ipsec_output.c:486
| #18 0xc08d70ae in ipsec_process_done (m=0xc24d4a00, isr=0xc2435100)
|     at /usr/src/sys/netipsec/ipsec_output.c:160
| #19 0xc08e629c in esp_output_cb (crp=0xc24975a0)
|     at /usr/src/sys/netipsec/xform_esp.c:965
| #20 0xc0924330 in crypto_done (crp=0xc24975a0)
|     at /usr/src/sys/opencrypto/crypto.c:1148
| #21 0xc092773e in swcr_process (dev=0xc21f5a00, crp=0xc24975a0, hint=0)
|     at /usr/src/sys/opencrypto/cryptosoft.c:975
| #22 0xc0925376 in crypto_invoke (cap=0xc21f5a00, crp=0xc24975a0, hint=0)
|     at cryptodev_if.h:53
| #23 0xc0925dbc in crypto_dispatch (crp=0xc24975a0)
|     at /usr/src/sys/opencrypto/crypto.c:798
| #24 0xc08e6906 in esp_output (m=0xc24d4a00, isr=0xc2435100, mp=0x0, skip=20, 
|     protoff=9) at /usr/src/sys/netipsec/xform_esp.c:875
| #25 0xc08d743b in ipsec4_process_packet (m=0xc24d4a00, isr=0xc2435100, 
|     flags=0, tunalready=0) at /usr/src/sys/netipsec/ipsec_output.c:486
| #26 0xc083cee3 in ip_ipsec_output (m=0xcd057af8, inp=0xc23d0000, 
|     flags=0xcd057b04, error=0xcd057ad8, ro=0xcd057b00, iproute=0xcd057abc, 
|     dst=0xcd057ad4, ia=0xcd057ad0, ifp=0xcd057ae0)
|     at /usr/src/sys/netinet/ip_ipsec.c:331
| #27 0xc083e9af in ip_output (m=0xc24d4a00, opt=0x0, ro=0xcd057abc, flags=0, 
|     imo=0x0, inp=0xc23d0000) at /usr/src/sys/netinet/ip_output.c:418
| #28 0xc08a85b3 in udp_send (so=0xc24a1000, flags=0, m=0xc24d4a00, addr=0x0, 
|     control=0x0, td=0xc2337420) at /usr/src/sys/netinet/udp_usrreq.c:972
| #29 0xc07af576 in sosend_dgram (so=0xc24a1000, addr=0x0, uio=0xcd057c60, 
|     top=0xc24d4a00, control=0x0, flags=)
|     at /usr/src/sys/kern/uipc_socket.c:1053
| #30 0xc07ac41f in sosend (so=0xc24a1000, addr=0x0, uio=0xcd057c60, top=0x0, 
|     control=0x0, flags=0, td=0xc2337420)
|     at /usr/src/sys/kern/uipc_socket.c:1286
| #31 0xc0796a1b in soo_write (fp=0xc2354e58, uio=0xcd057c60, 
|     active_cred=0xc20f8200, flags=0, td=0xc2337420)
|     at /usr/src/sys/kern/sys_socket.c:103
| #32 0xc07900c7 in dofilewrite (td=0xc2337420, fd=8, fp=0xc2354e58, 
|     auio=0xcd057c60, offset=-1, flags=0) at file.h:254
| #33 0xc07903a8 in kern_writev (td=0xc2337420, fd=8, auio=0xcd057c60)
|     at /usr/src/sys/kern/sys_generic.c:401
| #34 0xc079041f in write (td=0xc2337420, uap=0xcd057cfc)
|     at /usr/src/sys/kern/sys_generic.c:317
| #35 0xc0a7cc85 in syscall (frame=0xcd057d38)
|     at /usr/src/sys/i386/i386/trap.c:1035
| #36 0xc0a632c0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
| #37 0x00000033 in ?? ()

Our custom kernel config is:

| include         GENERIC
| ident           RCBSD_REL7
| options         IPSEC                                         
| options         IPSEC_FILTERTUNNEL
| device          crypto
| options         QUOTA

I'm pretty sure the kernel isn't supposed to crash, so any tips on
fixing whatever it is that's broken here?

-- 
Chris Cowart
Network Technical Lead
Network & Infrastructure Services, RSSP-IT
UC Berkeley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20090217/2126dd10/attachment.pgp


More information about the freebsd-hackers mailing list