kern/116172: Network / ipv6 recursive mutex panic

Peter Wemm peter at wemm.org
Thu Sep 6 23:50:01 PDT 2007


>Number:         116172
>Category:       kern
>Synopsis:       Network / ipv6 recursive mutex panic
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Sep 07 06:50:01 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Peter Wemm
>Release:        FreeBSD 7.0-CURRENT amd64
>Organization:
FreeBSD.org
>Environment:
System: FreeBSD overcee.wemm.org 7.0-CURRENT FreeBSD 7.0-CURRENT #84: Sun Aug 26 02:05:15 PDT 2007 peter at overcee.wemm.org:/home/peter/fbp4/hammer/sys/amd64/compile/OVERCEE amd64


>Description:

At reboot, machine panics with:

panic: _mtx_lock_sleep: recursed on non-recursive mutex rtentry @ ../../../net/route.c:197

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x1e4
_mtx_lock_sleep() at _mtx_lock_sleep+0x112
_mtx_lock_flags() at _mtx_lock_flags+0x7e
rtalloc1() at rtalloc1+0x1fe
nd6_lookup() at nd6_lookup+0x5d
nd6_is_addr_neighbor() at nd6_is_addr_neighbor+0x33
nd6_output() at nd6_output+0x1e9
ip6_output() at ip6_output+0x1206
tcp_output() at tcp_output+0x1151
tcp_usr_disconnect() at tcp_usr_disconnect+0x74
soclose() at soclose+0x359
fdrop() at fdrop+0xdc
closef() at closef+0x1eb
fdfree() at fdfree+0x10d
exit1() at exit1+0x2bc
sys_exit() at sys_exit+0xe
syscall() at syscall+0x1bc
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (1, FreeBSD ELF64, sys_exit), rip = 0x800dd087c, rsp = 0x7fffffffc4e8, rbp = 0 ---

(kgdb) where
#0  doadump () at pcpu.h:194
#1  0xffffffff80294af5 in boot (howto=260) at ../../../kern/kern_shutdown.c:412
#2  0xffffffff80294f22 in panic (fmt=Variable "fmt" is not available.) at ../../../kern/kern_shutdown.c:571
#3  0xffffffff8028a8e2 in _mtx_lock_sleep (m=Variable "m" is not available.) at ../../../kern/kern_mutex.c:310
#4  0xffffffff8028a96e in _mtx_lock_flags (m=Variable "m" is not available.) at ../../../kern/kern_mutex.c:186
#5  0xffffffff8032be5e in rtalloc1 (dst=0xffffffffa49083e0, report=0, ignflags=0) at ../../../net/route.c:197
#6  0xffffffff8036b96d in nd6_lookup (addr6=0xffffff0003f92da8, create=0, ifp=0xffffff0003c82800) at ../../../netinet6/nd6.c:819
#7  0xffffffff8036bc73 in nd6_is_addr_neighbor (addr=0xffffff0003f92da0, ifp=0xffffff0003c82800) at ../../../netinet6/nd6.c:998
#8  0xffffffff8036c189 in nd6_output (ifp=0xffffff0003c82800, origifp=0xffffff0003c82800, m0=0xffffff0003618d00, dst=0xffffff0003f92da0, rt0=0xffffff0003ca15a0) at ../../../netinet6/nd6.c:1960
#9  0xffffffff80369866 in ip6_output (m0=Variable "m0" is not available.) at ../../../netinet6/ip6_output.c:927
#10 0xffffffff803478e1 in tcp_output (tp=0xffffff00072451f0) at ../../../netinet/tcp_output.c:1104
#11 0xffffffff80351544 in tcp_usr_disconnect (so=Variable "so" is not available.) at ../../../netinet/tcp_usrreq.c:576
#12 0xffffffff802e8c69 in soclose (so=0xffffff0007182ae0) at ../../../kern/uipc_socket.c:642
#13 0xffffffff8026cbcc in fdrop (fp=0xffffff00071fc690, td=0xffffff00072ac340) at file.h:297
#14 0xffffffff8026dfcb in closef (fp=0xffffff00071fc690, td=0xffffff00072ac340) at ../../../kern/kern_descrip.c:1983
#15 0xffffffff8026eafd in fdfree (td=0xffffff00072ac340) at ../../../kern/kern_descrip.c:1693
#16 0xffffffff8027786c in exit1 (td=0xffffff00072ac340, rv=65280) at ../../../kern/kern_exit.c:272
#17 0xffffffff8027870e in sys_exit (td=Variable "td" is not available.) at ../../../kern/kern_exit.c:98
#18 0xffffffff80414ebc in syscall (frame=0xffffffffa4908c70) at ../../../amd64/amd64/trap.c:836
#19 0xffffffff803fe1ab in Xfast_syscall () at ../../../amd64/amd64/exception.S:275

There is an active ssh session over ipv6 at the time of the reboot.

The IPv6 routing table is a bit strange, but it does what I need.  There are
overlapping routes of different prefix lengths.

sk0: (default ipv4 gateway, has no active IPv6 activity)
fxp0: inet6 2001:470:1f01:523:1::1 prefixlen 80
tun0: inet6 2001:470:1f01:523:1::1 --> 2001:470:1f01:523::1 prefixlen 128
Note: identical local address on tun0 vs fxp0.


relevant parts of rc.conf:
ipv6_enable="YES"
ipv6_network_interfaces="tun0 fxp0"
ipv6_default_interface="fxp0"
ipv6_ifconfig_fxp0="2001:470:1f01:523:1::1 prefixlen 80"
ipv6_ifconfig_tun0="2001:470:1f01:523:1::1 2001:470:1f01:523::1 prefixlen 128"
ipv6_defaultrouter="2001:470:1f01:523::1"
ipv6_gateway_enable="YES"

start_if.tun0 creates tun0 and runs a custom ipv6 tunnel program.

There is a ssh connection between both ends of the tun0.  ie: from:
2001:470:1f01:523:1::1  to:
2001:470:1f01:523::1

>How-To-Repeat:

Set up overlapping routes with a tunnel.  Open a ssh.  reboot.
This is a 100% reliable panic for me.  Every reboot causes it.

>Fix:

I'm currently unsure what the key trigger is.  I suspect that there
is a route reference count race with killing the tun0 process, killing
the ssh, and routing teardown.

I will figure out an exact recipe to trigger it if the above isn't
enough.  I wanted to document it before 7.0 - I've already forgotten
about it for 2 weeks.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list