bge panic in 8.0

Erik Klavon erikk at berkeley.edu
Wed Jan 20 23:12:52 UTC 2010


On Thu, Jan 14, 2010 at 03:26:18PM -0800, Erik Klavon wrote:
> On Wed, Jan 13, 2010 at 06:06:40PM -0800, Pyun YongHyeon wrote:
> > On Wed, Jan 13, 2010 at 05:47:19PM -0800, Erik Klavon wrote:
> > > One of my amd64 machines running 8.0p1 acting as a NAT system for many
> > > network clients dropped into kdb today. tr indicates a problem in
> > > bge.
> > > 
> > > Tracing pid 12 tid 100033 td 0xffffff0001687000
> > > pmap_kextract() at pmap_kextract+0x4e
> > > bus_dmamap_load() at bus_dmamap_load+0xab
> > > bge_newbuf_std() at bge_newbuf_std+0xcc
> > > bge_rxeof() at bge_rxeof+0x36a
> > > bge_intr() at bge_intr+0x1c0
> > > intr_event_execute_handlers() at intr_event_execute_handlers+0xfd
> > > ithread_loop() at ithread_loop+0x8e
> > > fork_exit() at fork_exit+0x118
> > > fork_trampoline() at fork_trampoline+0xe
> > > --- trap 0, rip = 0, rsp = 0xffffff8074c01d30, rbp = 0 ---
> > > 
> > > I haven't been able to find a PR that matches this particular trace.
> > > 
> > > Pyun recently MFCd to stable (hence my post to this list) some changes
> > > to bge that involve functions in the above trace and according to the
> > > commit log (r201685) may address a kernel panic. Is there any
> > > indication in the above trace that this is the type of panic the
> > > commit attempts to address? I don't have a core dump for this
> > > panic. This machine has been unstable on 8, so I may be able to get a
> > > core dump in the future. If there is other information you'd like me
> > > to gather, please let me know.
> > 
> > Yes, that part of code in trace above were rewritten to address
> > bus_dma(9) issues. So it would be great if you can try latest
> > bge(4) in stable/8 and let me know how it goes on your box. I guess
> > you can just download if_bge.c and if_bgereg.h from stable/8 and
> > rebuild bge(4) would be enough to run it on 8.0-RELEASE.
> 
> Great, I will try this out on a test machine today. If it holds up
> under testing, I will put it into production. These crashes can happen
> weeks after a machine boots, so I won't know if the problem is solved
> for some time. Thanks for your help,

I didn't run into any problems while testing. I started running bge(4)
from stable in production this morning. I had three kernel panics in a
couple hours; here's an example

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff805ccf17
stack pointer           = 0x28:0xffffff800004f830
frame pointer           = 0x28:0xffffff800004f890
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0 pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (ng_queue0) 
[thread pid 13 tid 100009 ]
Stopped at      m_copym+0x37:   movl    0x18(%r12),%eax

db> tr
Tracing pid 13 tid 100009 td 0xffffff000189aab0
m_copym() at m_copym+0x37
ip_fragment() at ip_fragment+0x131
ip_output() at ip_output+0xeec
ip_forward() at ip_forward+0x16a
ip_input() at ip_input+0x57d
ng_ipfw_rcvdata() at ng_ipfw_rcvdata+0xb9
ng_apply_item() at ng_apply_item+0x220
ngthread() at ngthread+0x16b
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800004fd30, rbp = 0 ---

I tried the kdb command 'panic' to dump core, but this command only
produced further faults. After the third panic related to m_copym, I
reverted to the previous version of bge(4) from 8.0p1. A couple of
hours has passed without these panics repeating while running the
previous version of bge(4).

There is a long open PR, 89070, that looks to be related to the above
panic. I don't have any proof that these panics resulted from the
newer version of bge(4). I haven't seen kernel panics such as these on
any of the other machines with this same configuration.

I have seen a kernel panic on systems running 8.0p1 with a different
stack trace than the one I posted previous that also appears to be
related to bge(4).

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x28
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff802cdf0e
stack pointer           = 0x28:0xffffff8074c1ab10
frame pointer           = 0x28:0xffffff8074c1ab70
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq25: bge1)
[thread pid 12 tid 100034 ]
Stopped at      bge_rxeof+0x1be:        movq    %r15,0x28(%r14)

db> trace
Tracing pid 12 tid 100034 td 0xffffff0001680ab0
bge_rxeof() at bge_rxeof+0x1be
bge_intr() at bge_intr+0x1c0
intr_event_execute_handlers() at intr_event_execute_handlers+0xfd
ithread_loop() at ithread_loop+0x8e
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8074c1ad30, rbp = 0 ---

Erik


More information about the freebsd-stable mailing list