rtentry panic with FIB

Sat Aug 30 14:57:55 UTC 2008

Robert Watson wrote:
> 
> On Fri, 29 Aug 2008, John Baldwin wrote:
> 
>> Unfortunately it hung trying to dump, so all I have is the stack trace 
>> from DDB.  This is recent HEAD running stress2
>>
>> panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../1
> 
> Kip and I have theorized that increased parallelism at higher layers of 
> the network stack is exposing route locking and reference counting to 
> more stress than it had done previously, and that as such we're starting 
> to trigger races in the routing code more than we used to.  While I 
> wouldn't rule out a FIB-related bug, it seems more likely to me that 
> we've hit a general bug in locking/references in the ethernet link layer 
> / ARP, and we need to take a careful look at what's going on throughout 
> that layer.
> 
> Unfortunately, that's not something I have time to work on currently, so 
> it would be great if people with an existing interest in the routing 
> code (Julian and Qing have done the most work there recently?) could 
> spend a few hours looking really carefully at what is happening.

I'm planning on spending few hours on looking at this this weekend..

> 
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> 
>>
>> cpuid = 1
>> KDB: enter: panic
>> [thread pid 14025 tid 100928 ]
>> Stopped at      kdb_enter+0x3d: movq    $0,0x435054(%rip)
>> db> tr
>> Tracing pid 14025 tid 100928 td 0xffffff0003773360
>> kdb_enter() at kdb_enter+0x3d
>> panic() at panic+0x14b
>> _mtx_lock_flags() at _mtx_lock_flags
>> _mtx_lock_flags() at _mtx_lock_flags+0xc3
>> rt_check_fib() at rt_check_fib+0x1ea
>> arpresolve() at arpresolve+0x77
>> ether_output() at ether_output+0x180
>> ip_output() at ip_output+0xb4f
>> udp_send() at udp_send+0x47d
>> sosend_dgram() at sosend_dgram+0x1fa
>> soo_write() at soo_write+0x30
>> dofilewrite() at dofilewrite+0x7a
>> kern_writev() at kern_writev+0x52
>> write() at write+0x4d
>> syscall() at syscall+0x1bf
>> Xfast_syscall() at Xfast_syscall+0xab
>> --- syscall (4, FreeBSD ELF64, write), rip = 0x80071cb7c, rsp =
>> 0x7fffffffe628,-
>> db> c
>> Uptime: 1h39m18s
>> Physical memory: 2038 MB
>> Dumping 263 MB:pid 14025 (udp), uid 26840, was killed: exceeded 
>> maximum CPU
>> limt
>> pid 14099 (udp), uid 26840, was killed: exceeded maximum CPU limit
>> pid 14100 (udp), uid 26840, was killed: exceeded maximum CPU limit
>>
>> -- 
>> John Baldwin
>> _______________________________________________
>> freebsd-current at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to 
>> "freebsd-current-unsubscribe at freebsd.org"
>>
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"