help w/panic under heavy load - 5.4

Thu Jul 21 01:00:49 GMT 2005

Giorgos/John/et.al :)

I have compiled/tested/traced about 15 separate kernels for this, and am happy
to provide crashdumps/etc to anyone interested :)

I decided to start over - create a GENERIC kernel 
(w/ DDB/KDB/INVARIANTS/INVARIANT_SUPPORT) and see what I started to get if I could
reproduce the problem more specifically.

Just using the GENERIC w/ debug kernel - I did make it crash - although it took some
handholding, lots of throwing packets at it and running processes on the box, about 
5-10 minutes - didn't really try to reproduce it - since it really wasn't the fast
panic that I was concerned about before. i've included the panic below here anyhow.

What I did notice - was w/o any options - and turning on ip.fastforwarding via
sysctl - the crash was reproducible consistently with the (pretty much) generic
kernel, same kernel traces as before basically. I also received an 'interrupt storm'
message on the console from the ip.fastforwarding trace - have seen that a few times
in the past when polling was not enabled before it panic'd.

I welcome all comments/thoughts/directions - happy to poke/prod/compile/debug - 
just really don't know where to go from here.

Thanks for your help!
/Edwin

Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
sysctl: ip.fastforwarding=0 <--- turned off

ospfd# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 27 tid 100021 ]
Stopped at      kdb_enter+0x2b: nop
db> where
Tracing pid 27 tid 100021 td 0xc0ed0180
kdb_enter(c0821a6a) at kdb_enter+0x2b
panic(c0826049,0,c076b79c,c102bb00,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c124100e,c76d1a04,5dc,0,1) at ip_fragment+0x214
ip_output(c1201200,0,c76d19d0,1,0,0) at ip_output+0x74c
ip_forward(c1201200,0) at ip_forward+0x2d4
ip_input(c1201200) at ip_input+0x4a7
netisr_processqueue(c08ec138) at netisr_processqueue+0x6e
swi_net(0) at swi_net+0xc2
ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db>

Kernel: DDB8-GENDBG (GENERIC + options DDB/KDB/INVARIANTS/INVARIANT_SUPPORT)
Sysctl: ip.fastforwarding=1

fb54c# Interrupt storm detected on "irq10: sis0 sis1+"; throttling interrupt source
fb54c#
fb54c#
fb54c#
fb54c# panic: m_copym, offset > size of mbuf chain
KDB: enter: panic
[thread pid 21 tid 100015 ]
Stopped at      kdb_enter+0x2b: nop
db> where
Tracing pid 21 tid 100015 td 0xc0ecc780
kdb_enter(c08165b2) at kdb_enter+0x2b
panic(c081ab91,0,c0760a0c,c1028800,100) at panic+0xbb
m_copym(0,5dc,5c8,1,14) at m_copym+0x60
ip_fragment(c121880e,c76bfc6c,5dc,0,1) at ip_fragment+0x214
ip_fastforward(c11f2600) at ip_fastforward+0x6ed
ether_demux(c0f90000,c11f2600,52,c0f8b8d8,a) at ether_demux+0x259
ether_input(c0f90000,c11f2600,c0f902cc,0,c0826fc6) at ether_input+0x25d
sis_rxeof(c0f90000) at sis_rxeof+0x18b
sis_intr(c0f90000) at sis_intr+0xa3
ithread_loop(c0ec6880,c76bfd48,c0ec6880,c05feb3c,0) at ithread_loop+0x124
fork_exit(c05feb3c,c0ec6880,c76bfd48) at fork_exit+0xa4
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xc76bfd7c, ebp = 0 ---
db> doadump
No such command
db> call doadump
Dumping 128 MB
 16 32 48 64 80 96 112
Dump complete
0xf
db> reset

.

Giorgos Keramidas (keramida at freebsd.org) wrote:
> On 2005-07-19 22:03, Edwin <edwin at verolan.com> wrote:
> > Hi John,
> >
> > Updated the kernel, same crash under load, looks like m is null, you're right.
> >
> > Not quite sure where to go from here. I'm happy to do the footwork - just still real
> > hazy on the BSD kernel part of things.
> >
> > panic: m_copym, offset > size of mbuf chain
> > KDB: enter: panic
> > [thread pid 27 tid 100021 ]
> > Stopped at      kdb_enter+0x2b: nop
> > db> where
> > Tracing pid 27 tid 100021 td 0xc0ed0180
> > kdb_enter(c0821a6a) at kdb_enter+0x2b
> > panic(c0826049,0,c076b79c,c102d600,100) at panic+0xbb
> > m_copym(0,5dc,5c8,1,14) at m_copym+0x60
> > ip_fragment(c123180e,c76d1c38,5dc,0,1) at ip_fragment+0x214
> > ip_fastforward(c11fee00) at ip_fastforward+0x6ed
> > ether_demux(c0f90000,c11fee00,52,c0f8aad0,1f) at ether_demux+0x259
> > ether_input(c0f90000,c11fee00,c0f902d0,0,c08336ab) at ether_input+0x25d
> > sis_rxeof(c0f90000,1,5,c08e5500,c76d1ce0) at sis_rxeof+0x1ab
> > sis_poll(c0f90000,0,5) at sis_poll+0x7f
> > netisr_poll(0) at netisr_poll+0x188
> > swi_net(0) at swi_net+0x81
> > ithread_loop(c0ec6580,c76d1d48,c0ec6580,c060030c,0) at ithread_loop+0x124
> > fork_exit(c060030c,c0ec6580,c76d1d48) at fork_exit+0xa4
> > fork_trampoline() at fork_trampoline+0x8
> > --- trap 0x1, eip = 0, esp = 0xc76d1d7c, ebp = 0 ---
> 
> Both tracebacks contain sis_poll() somewhere in the call stack?  Are you
> using POLLING?  If yes, can you try without POLLING and see if the crash
> can still be reproduced?
> 
> - Giorgos
>