kern/86427: LOR / Deadlock with FASTIPSEC and nat

Mike Tancsa mike at sentex.net
Wed Sep 21 13:10:06 PDT 2005


>Number:         86427
>Category:       kern
>Synopsis:       LOR / Deadlock with FASTIPSEC and nat
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Sep 21 20:10:05 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Mike Tancsa
>Release:        FreeBSD 6.0-BETA5 i386
>Organization:
<Sentex Communications>
>Environment:
System: FreeBSD gp-tor1-o.sentex.ca 6.0-BETA5 FreeBSD 6.0-BETA5 #5: Wed Sep 21 14:44:33 EDT 2005 mdtancsa at gp-tor1-o.sentex.ca:/usr/obj/usr/src/sys/global i386

>Description:
	A box running with openVPN along with a FastIPSEC tunnel will enter into a deadlock under the right conditions
>How-To-Repeat:
	client machines come in via tun0 (openvpn interface)
  	and are natted going out vlan7
	/sbin/ipfw add 3400 divert natd ip from any to any via vlan7
	/sbin/natd -unregistered_only -n vlan7
	 /sbin/ifconfig gif0 create tunnel xxx.yyy.zzz.86 aaa.bbb.ccc.118
        /sbin/ifconfig gif0 10.44.99.2 netmask 255.255.255.252 10.44.99.1 netmask 255.255.255.252

Then we add a simple IPSEC policy across the gif tunnel 
setkey -c <<EOF
add xxx.yyy.zzz.86 aaa.bbb.ccc.118 esp 1044 -m any -E blowfish-cbc "JOy1QIbr8swoTIiuZYCtkDiCSHeI1eb48rux7IzwbEXyA";
add aaa.bbb.ccc.118 xxx.yyy.zzz.86 esp 1044 -m any -E blowfish-cbc "JOy1QIbr8swoTIiuZYCtkDiCSHeI1eb48rux7IzwbEXyA";
spdadd xxx.yyy.zzz.86/32 aaa.bbb.ccc.118/32 any -P out ipsec esp/tunnel/xxx.yyy.zzz.86-aaa.bbb.ccc.118/require;
spdadd aaa.bbb.ccc.118/32 xxx.yyy.zzz.86/32 any -P in  ipsec esp/tunnel/aaa.bbb.ccc.118-xxx.yyy.zzz.86/require;
EOF


I then bring up the openvpn tunnels (just a few test clients).  I can then trigger the deadlock by 
doing an scp across the gif tunnel

On the serial console I see


lock order reversal
 1st 0xc292a090 inp (divinp) @ /usr/src/sys/netinet/ip_divert.c:327
 2nd 0xc28bc950 ipsec request (ipsec request) @ /usr/src/sys/netipsec/ipsec_output.c:354
KDB: stack backtrace:
kdb_backtrace(0,ffffffff,c075ed70,c075ed98,c07261c4) at kdb_backtrace+0x29
witness_checkorder(c28bc950,9,c06f7aa6,162) at witness_checkorder+0x564
_mtx_lock_flags(c28bc950,0,c06f7aa6,162,0) at _mtx_lock_flags+0x5b
ipsec4_process_packet(c2a4a400,c28bc900,22,0,c28b7600) at ipsec4_process_packet+0x45
ip_output(c2a4a400,0,e741eb28,22,0) at ip_output+0x74f
div_output(c2926b20,c2a4a400,c255a280,0,e741ec08) at div_output+0x185
div_send(c2926b20,0,c2a4a400,c255a280,0) at div_send+0x3f
sosend(c2926b20,c255a280,e741ec3c,c2a4a400,0) at sosend+0x5e3
kern_sendit(c2731600,3,e741ecbc,0,0) at kern_sendit+0x104
sendit(c2731600,3,e741ecbc,0,bfbdec1c) at sendit+0x163
sendto(c2731600,e741ed04,6,2,296) at sendto+0x4d
syscall(3b,3b,3b,2,7c) at syscall+0x22f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (133, FreeBSD ELF32, sendto), eip = 0x280c5d97, esp = 0xbfbdeb0c, ebp = 0xbfbeebb8 ---


And here the box is frozen up.  On the serial console, I can break into the debugger


telnet> send break
KDB: enter: Line break on console
[thread pid 12 tid 100004 ]
Stopped at      kdb_enter+0x2b: nop     
db> 


db> show pcpu
cpuid        = 0
curthread    = 0xc22a9900: pid 12 "idle: cpu0"
curpcb       = 0xe3481d90
fpcurthread  = none
idlethread   = 0xc22a9900: pid 12 "idle: cpu0"
APIC ID      = 0
currentldt   = 0x50
spin locks held:
db> show alllocks
Process 682 (ssh) thread 0xc2aa6000 (100137)
exclusive sleep mutex crypto (crypto op queues) r = 0 (0xc07aa420) locked @ /usr/src/sys/opencrypto/crypto.c:669
exclusive sleep mutex ipsec request r = 1 (0xc28bc950) locked @ /usr/src/sys/netipsec/xform_esp.c:872
exclusive sleep mutex inp (tcpinp) r = 0 (0xc28c6a68) locked @ /usr/src/sys/netinet/tcp_usrreq.c:651
Process 586 (natd) thread 0xc2731600 (100080)
exclusive sleep mutex inp (divinp) r = 0 (0xc292a090) locked @ /usr/src/sys/netinet/ip_divert.c:327
exclusive sleep mutex div r = 0 (0xc07a0b6c) locked @ /usr/src/sys/netinet/ip_divert.c:325
Process 37 (swi4: clock sio) thread 0xc2317d80 (100036)
exclusive sleep mutex tcp r = 0 (0xc07a1ecc) locked @ /usr/src/sys/netinet/tcp_timer.c:457
db> ps
  pid   proc     uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
  682 c2aa4830    0   681   681 0004002 [LOCK    div c29f57c0] ssh
  681 c2aa4a3c    0   679   681 0004002 [SLPQ piperd 0xc272f000][SLP] scp
  679 c2894830    0   678   679 0004002 [SLPQ pause 0xc2894864][SLP] csh
  678 c2894624 1001   675   678 0004102 [SLPQ wait 0xc2894624][SLP] su
  675 c26efa3c 1001   674   675 0004002 [SLPQ pause 0xc26efa70][SLP] csh
  674 c2894a3c 1001   672   672 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
  672 c2734830    0   523   672 0004100 [SLPQ sbwait 0xc2a2920c][SLP] sshd
  663 c29f320c    0   657   663 0004102 [SLPQ select 0xc079d3c4][SLP] openvpn
  657 c2894c48    0   656   657 0004002 [SLPQ pause 0xc2894c7c][SLP] csh
  656 c29f3000 1001   653   656 0004102 [SLPQ wait 0xc29f3000][SLP] su
  653 c29f3624 1001   652   653 0004002 [SLPQ pause 0xc29f3658][SLP] csh
  652 c26ec624 1001   650   650 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
  650 c29f3a3c    0   523   650 0004100 [SLPQ sbwait 0xc2a2a20c][SLP] sshd
  649 c29f3c48    0     1   649 0004002 [SLPQ ttyin 0xc254d410][SLP] getty
  648 c29f7000    0     1   648 0004002 [SLPQ ttyin 0xc254f010][SLP] getty
  647 c29f720c    0     1   647 0004002 [SLPQ ttyin 0xc254c410][SLP] getty
  646 c29f7418    0     1   646 0004002 [SLPQ ttyin 0xc254fc10][SLP] getty
  645 c29f7624    0     1   645 0004002 [SLPQ ttyin 0xc2555010][SLP] getty
  644 c29f7830    0     1   644 0004002 [SLPQ ttyin 0xc254f810][SLP] getty
  643 c2734000    0     1   643 0004002 [SLPQ ttyin 0xc2555810][SLP] getty
  642 c2733000    0     1   642 0004002 [SLPQ ttyin 0xc2557010][SLP] getty
  641 c26efc48    0     1   641 0004002 [SLPQ ttyin 0xc2556010][SLP] getty
  595 c2730a3c    0     1    64 0000002 [SLPQ select 0xc079d3c4][SLP] 3dm2
  586 c2730624    0     1   586 0000000 [LOCK ipsec request c26bacc0] natd
  570 c2734624    0     1   570 0000101 [SLPQ select 0xc079d3c4][SLP] bgpd
  568 c289020c    0     1   568 0000101 [SLPQ select 0xc079d3c4][SLP] zebra
  545 c2734c48    0     1   545 0000000 [SLPQ nanslp 0xc07500ac][SLP] cron
  532 c2733830   25     1   532 0000100 [SLPQ pause 0xc2733864][SLP] sendmail
  528 c2730418    0     1   528 0000100 [SLPQ select 0xc079d3c4][SLP] sendmail
  523 c2733c48    0     1   523 0000100 [SLPQ select 0xc079d3c4][SLP] sshd
  521 c2890c48    0   505   505 0000000 [SLPQ pause 0xc2890c7c][SLP] ntpd
  505 c26eca3c    0     1   505 0000000 [SLPQ select 0xc079d3c4][SLP] ntpd
  407 c2890624   53     1   407 0000100 [SLPQ select 0xc079d3c4][SLP] named
  346 c2890000    0     1   346 0000000 [SLPQ select 0xc079d3c4][SLP] syslogd
  313 c26ec000    0     1   313 0000000 [SLPQ select 0xc079d3c4][SLP] devd
  191 c26ef000    0     1   191 0000000 [SLPQ pause 0xc26ef034][SLP] adjkerntz
   63 c238d20c    0     0     0 0000204 [SLPQ - 0xe4f4fd04][SLP] schedcpu
   62 c238d418    0     0     0 0000204 [SLPQ - 0xc07a502c][SLP] nfsiod 3
   61 c238d624    0     0     0 0000204 [SLPQ - 0xc07a5028][SLP] nfsiod 2
   60 c238d830    0     0     0 0000204 [SLPQ - 0xc07a5024][SLP] nfsiod 1
   59 c238da3c    0     0     0 0000204 [SLPQ - 0xc07a5020][SLP] nfsiod 0
   58 c238dc48    0     0     0 0000204 [SLPQ vlruwt 0xc238dc48][SLP] vnlru
   57 c23f4000    0     0     0 0000204 [SLPQ syncer 0xc074fe1c][SLP] syncer
   56 c23f420c    0     0     0 0000204 [SLPQ psleep 0xc079d90c][SLP] bufdaemon
   55 c23f4418    0     0     0 000020c [SLPQ pgzero 0xc07ab684][SLP] pagezero
   54 c23f4624    0     0     0 0000204 [SLPQ psleep 0xc07ab1d4][SLP] vmdaemon
   53 c23f4830    0     0     0 0000204 [SLPQ psleep 0xc07ab190][SLP] pagedaemon
   52 c23f4a3c    0     0     0 0000204 [IWAIT] swi0: sio
   51 c23f4c48    0     0     0 0000204 [SLPQ usbevt 0xc238b210][SLP] usb4
   50 c2316624    0     0     0 0000204 [SLPQ usbevt 0xc23d4210][SLP] usb3
   49 c2316830    0     0     0 0000204 [SLPQ usbevt 0xc23c0210][SLP] usb2
   48 c2316a3c    0     0     0 0000204 [SLPQ usbevt 0xc2394210][SLP] usb1
   47 c2316c48    0     0     0 0000204 [SLPQ usbtsk 0xc074ad64][SLP] usbtask
   46 c238c000    0     0     0 0000204 [SLPQ usbevt 0xc2398210][SLP] usb0
   45 c238c20c    0     0     0 0000204 [IWAIT] swi6: task queue
   44 c238c418    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task2
   43 c238c624    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task1
    9 c238c830    0     0     0 0000204 [SLPQ - 0xc233a980][SLP] acpi_task0
    8 c238ca3c    0     0     0 0000204 [SLPQ - 0xc233aa00][SLP] kqueue taskq
   42 c238cc48    0     0     0 0000204 [IWAIT] swi2: cambio
   41 c238d000    0     0     0 0000204 [IWAIT] swi5:+
    7 c2309c48    0     0     0 0000204 [SLPQ - 0xc233ac80][SLP] thread taskq
   40 c2315000    0     0     0 0000204 [IWAIT] swi6:+
   39 c231520c    0     0     0 0000204 [SLPQ - 0xc074a5a0][SLP] yarrow
    6 c2315418    0     0     0 0000204 [SLPQ - 0xc074d5a8][SLP] g_down
    5 c2315624    0     0     0 0000204 [SLPQ - 0xc074d5a4][SLP] g_up
    4 c2315830    0     0     0 0000204 [SLPQ - 0xc074d59c][SLP] g_event
    3 c2315a3c    0     0     0 0000204 [SLPQ crypto_ret_wait 0xc07aa444][SLP] crypto returns
    2 c2315c48    0     0     0 0000204 [SLPQ crypto_wait 0xc07aa404][SLP] crypto
   38 c2316000    0     0     0 0000204 [IWAIT] swi3: vm
   37 c231620c    0     0     0 000020c [LOCK    inp c22f8dc0] swi4: clock sio
   36 c2316418    0     0     0 0000204 [LOCK    div c29f57c0] swi1: net
   35 c2300624    0     0     0 0000204 [IWAIT] irq23: uhci0 ehci0
   34 c2300830    0     0     0 0000204 [IWAIT] irq22: em0
   33 c2300a3c    0     0     0 0000204 [IWAIT] irq21: twe0
   32 c2300c48    0     0     0 0000204 [IWAIT] irq20: fxp0
   31 c2309000    0     0     0 0000204 [IWAIT] irq19: uhci1++
   30 c230920c    0     0     0 0000204 [IWAIT] irq18: uhci2
   29 c2309418    0     0     0 0000204 [IWAIT] irq17: fwohci0
   28 c2309624    0     0     0 0000204 [IWAIT] irq16: uhci3
   27 c2309830    0     0     0 0000204 [IWAIT] irq15: ata1
   26 c2309a3c    0     0     0 0000204 [IWAIT] irq14: ata0
   25 c22ad20c    0     0     0 0000204 [IWAIT] irq13:
   24 c22ad418    0     0     0 0000204 [IWAIT] irq12:
   23 c22ad624    0     0     0 0000204 [IWAIT] irq11:
   22 c22ad830    0     0     0 0000204 [IWAIT] irq10:
   21 c22ada3c    0     0     0 0000204 [IWAIT] irq9: acpi0
   20 c22adc48    0     0     0 0000204 [IWAIT] irq8:
   19 c2300000    0     0     0 0000204 [IWAIT] irq7: ppc0
   18 c230020c    0     0     0 0000204 [IWAIT] irq6:
   17 c2300418    0     0     0 0000204 [IWAIT] irq5:
   16 c22a8000    0     0     0 0000204 [IWAIT] irq4: sio0
   15 c22a820c    0     0     0 0000204 [IWAIT] irq3:
   14 c22a8418    0     0     0 0000204 [IWAIT] irq0:
   13 c22a8624    0     0     0 0000204 [IWAIT] irq1: atkbd0
   12 c22a8830    0     0     0 000020c [CPU 0] idle: cpu0
   11 c22a8a3c    0     0     0 000020c [CPU 1] idle: cpu1
    1 c22a8c48    0     0     1 0004200 [SLPQ wait 0xc22a8c48][SLP] init
   10 c22ad000    0     0     0 0000204 [SLPQ ktrace 0xc074dff8][SLP] ktrace
    0 c074d6a0    0     0     0 0000200 [IWAIT] swapper
db> show lockedvnods
Locked vnodes
db>  show lockedbufs
db> trace
Tracing pid 12 tid 100004 td 0xc22a9900
kdb_enter(c07056df) at kdb_enter+0x2b
siointr1(c254b000,c07ad5c0,0,c07054eb,56e) at siointr1+0xce
siointr(c254b000) at siointr+0x21
intr_execute_handlers(c229f490,e3481c94,4,e3481cd8,c068c0e3) at intr_execute_handlers+0xa5
lapic_handle_intr(34) at lapic_handle_intr+0x2e
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc08a18fd, esp = 0xe3481cd8, ebp = 0xe3481cd8 ---
acpi_cpu_c1(c074f780,1,e3481cf8,1,c22a8830) at acpi_cpu_c1+0x5
acpi_cpu_idle(e3481d0c,c0516a51,c05169f4,e3481d24,c0516834) at acpi_cpu_idle+0x13e
cpu_idle(c05169f4,e3481d24,c0516834,0,e3481d38) at cpu_idle+0x28
idle_proc(0,e3481d38,0,c05169f4,0) at idle_proc+0x5d
fork_exit(c05169f4,0,e3481d38) at fork_exit+0xa0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3481d6c, ebp = 0 ---
db> show lockedvnods
Locked vnodes
db> 

	
>Fix:

	Using pf instead of ipfw seems to work around the issue although I dont know if that 
	just makes it harder to trigger

	Using regular IPSEC seems to work just fine, but 
	debug.mpsafenet forced to 0 as ipsec requires Giant

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list