Deadlock in 7.0-RELEASE with nd6/rtalloc

Kevin Day toasty at dragondata.com
Sat Mar 15 22:59:35 PDT 2008


I've got a somewhat reproducible deadlock in 7.0-RELEASE. I believe  
the same was present in 6.x as well. The problem is that the deadlock  
is so hard that DDB doesn't work, so I've had to resort to firewire  
debugging. That makes mutex debugging a real challenge. :)

Two threads are deadlocking in rtalloc1. One is netisr receiving a  
packet for a v6 host. The other is a process trying to send a packet  
to the same host.

Process 1:

PID 14:
  7 Thread 100005 (PID=14: swi1: net)  sched_switch (td=0xc3875420,  
newtd=0xc3f48c60, flags=1) at ../../../kern/sched_4bsd.c:931

#0  sched_switch (td=0xc3875420, newtd=0xc3f48c60, flags=1)  
at ../../../kern/sched_4bsd.c:931
#1  0xc0642287 in mi_switch (flags=Variable "flags" is not available.
) at ../../../kern/kern_synch.c:442
#2  0xc066d60b in turnstile_wait (ts=0xc385ce10, owner=0xc3f48c60,  
queue=Variable "queue" is not available.
) at ../../../kern/subr_turnstile.c:747
#3  0xc062e73d in _mtx_lock_sleep (m=0xc40817e0, tid=3280426016,  
opts=0, file=0xc08ca1c5 "../../../net/route.c", line=197) at ../../../ 
kern/kern_mutex.c:416
#4  0xc062e84e in _mtx_lock_flags (m=0xc40817e0, opts=0,  
file=0xc08ca1c5 "../../../net/route.c", line=197) at ../../../kern/ 
kern_mutex.c:186
#5  0xc06dc405 in rtalloc1 (dst=0xc09ba704, report=1, ignflags=0)  
at ../../../net/route.c:197
#6  0xc06dd3c6 in rtalloc_ign (ro=0xc09ba700, ignore=0) at ../../../ 
net/route.c:117
#7  0xc06dd419 in rtalloc (ro=0xc09ba700) at ../../../net/route.c:103
#8  0xc07711dc in ip6_input (m=0xc3e48800) at ../../../netinet6/ 
ip6_input.c:479
#9  0xc06d752b in netisr_processqueue (ni=0xc09b74c4) at ../../../net/ 
netisr.c:143
#10 0xc06d75fb in swi_net (dummy=0x0) at ../../../net/netisr.c:250
#11 0xc061e7d5 in ithread_loop (arg=0xc383ac90) at ../../../kern/ 
kern_intr.c:1036
#12 0xc061bd58 in fork_exit (callout=0xc061e620 <ithread_loop>,  
arg=0xc383ac90, frame=0xe11b8d38) at ../../../kern/kern_fork.c:781
#13 0xc0845c30 in fork_trampoline () at ../../../i386/i386/exception.s: 
205


Process 2:

PID 4096:
  95 Thread 100078 (PID=4096: fping6)  sched_switch (td=0xc3f48c60,  
newtd=0xc3875a50, flags=1) at ../../../kern/sched_4bsd.c:931

(kgdb) bt
#0  sched_switch (td=0xc3f48c60, newtd=0xc3875a50, flags=1)  
at ../../../kern/sched_4bsd.c:931
#1  0xc0642287 in mi_switch (flags=Variable "flags" is not available.
) at ../../../kern/kern_synch.c:442
#2  0xc066d60b in turnstile_wait (ts=0xc385d280, owner=0xc3875420,  
queue=Variable "queue" is not available.
) at ../../../kern/subr_turnstile.c:747
#3  0xc062e73d in _mtx_lock_sleep (m=0xc3b8107c, tid=3287583840,  
opts=0, file=0xc08ca1c5 "../../../net/route.c", line=147) at ../../../ 
kern/kern_mutex.c:416
#4  0xc062e84e in _mtx_lock_flags (m=0xc3b8107c, opts=0,  
file=0xc08ca1c5 "../../../net/route.c", line=147) at ../../../kern/ 
kern_mutex.c:186
#5  0xc06dc243 in rtalloc1 (dst=0xe344a7f0, report=0, ignflags=0)  
at ../../../net/route.c:147
#6  0xc0777a35 in nd6_lookup (addr6=0xc409f1e4, create=0,  
ifp=0xc4069800) at ../../../netinet6/nd6.c:819
#7  0xc0777d4b in nd6_is_addr_neighbor (addr=0xc409f1dc,  
ifp=0xc4069800) at ../../../netinet6/nd6.c:998
#8  0xc077818f in nd6_output (ifp=0xc4069800, origifp=0xc4069800,  
m0=0xc3bca300, dst=0xc409f1dc, rt0=0xc4081780) at ../../../netinet6/ 
nd6.c:1960
#9  0xc07756e1 in ip6_output (m0=0xc3bca300, opt=0x0, ro=0xe344a9f0,  
flags=0, im6o=0x0, ifpp=0xe344aa74, inp=0xc3dd5438) at ../../../ 
netinet6/ip6_output.c:927
#10 0xc07806cc in rip6_output (m=0xc3bca300) at ../../../netinet6/ 
raw_ip6.c:452
#11 0xc0780c50 in rip6_send (so=0xc3fbc18c, flags=0, m=0xc3bca300,  
nam=0xc40a14c0, control=0x0, td=0xc3f48c60) at ../../../netinet6/ 
raw_ip6.c:793
#12 0xc069173d in sosend_generic (so=0xc3fbc18c, addr=0xc40a14c0,  
uio=0xe344abe8, top=0xc3bca300, control=0x0, flags=0, td=0xc3f48c60)  
at ../../../kern/uipc_socket.c:1240
#13 0xc068dff4 in sosend (so=0xc3fbc18c, addr=0xc40a14c0,  
uio=0xe344abe8, top=0x0, control=0x0, flags=0, td=0xc3f48c60)  
at ../../../kern/uipc_socket.c:1286
#14 0xc06946e6 in kern_sendit (td=0xc3f48c60, s=4, mp=0xe344ac64,  
flags=0, control=0x0, segflg=UIO_USERSPACE) at ../../../kern/ 
uipc_syscalls.c:789
#15 0xc06967a1 in sendit (td=0xc3f48c60, s=4, mp=0xe344ac64, flags=0)  
at ../../../kern/uipc_syscalls.c:730
#16 0xc06968b8 in sendto (td=0xc3f48c60, uap=0xe344acfc) at ../../../ 
kern/uipc_syscalls.c:841
#17 0xc08573b3 in syscall (frame=0xe344ad38) at ../../../i386/i386/ 
trap.c:1035
#18 0xc0845c20 in Xint0x80_syscall () at ../../../i386/i386/ 
exception.s:196



Process 2(fping) is trying to send a packet to a v6 host. In  
nd6_output, it grabs a lock on the rtentry for this host:

netinet6/nd6.c:1930
                 RT_LOCK(rt);

After this, it makes its way down to rtalloc, where it tries to get a  
lock on the head node.

net/route.c:147
         RADIX_NODE_HEAD_LOCK(rnh);

However, Process 1(netisr) already has a lock on the head node. It  
grabbed it on net/route.c:147 as well, and got down to:

net/route.c:197
                         RT_LOCK(newrt);


And just to sanity check, newrt in rtalloc1 is the same kernel address  
as rt in nd6_output. The v6 destination address of rip6_send is also  
the same as the v6 destination address received in ip6_input.


So, Process 1 has "radix head node", and needs "rtentry" for this  
route. Process 2 has "rtentry" for this route, and needs "radix head  
node". Deadlock.

I'm happy to file a PR, or I'm happy to try to fix this myself, but is  
there anyone here who's got familiarity with this chunk of code who  
can point me in the right direction of what's actually supposed to be  
happening to prevent this?

This also might be related to the LOR I reported in kern/121443.

-- Kevin



More information about the freebsd-net mailing list