kern/115651: Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2

Mon Aug 20 09:40:02 PDT 2007

>Number:         115651
>Category:       kern
>Synopsis:       Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_6_2
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Aug 20 16:40:01 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Scott Ullrich
>Release:        RELENG_6_2
>Organization:
pfSense
>Environment:
FreeBSD pfsense.geekgod.com 6.2-RELEASE-p7 FreeBSD 6.2-RELEASE-p7 #0: Sat Aug  4 18:35:24 EDT 2007 sullrich at builder6.pfsense.com:/usr/obj.pfSense/usr/src/sys/pfSense.6 i386
>Description:
Frequently racoon (ipsec-tools 0.7rc1 and also 0.6) will deadlock into the sbwait state or will enter a 100% cpu usage state and will not recover without killing the process and restarting.

ipsec-tools 0.67 will enter the state "sbwait" upon triggering the issue whereas ipsec-tools 0.7rc1 will enter a 100% tailspin.

Backtrace during this condition:

#0  0x2827a187 in recvfrom () from /lib/libc.so.6
#1  0x28225904 in recv () from /lib/libc.so.6
#2  0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826
#3  0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314
#4  0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1)
   at isakmp_inf.c:1173
#5  0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1)
   at isakmp_inf.c:565
#6  0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0,
   local=0xbfbfe770) at isakmp.c:671
#7  0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395
#8  0x0804bf88 in session () at session.c:223
#9  0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264
#0  0x2827a187 in recvfrom () from /lib/libc.so.6
#1  0x28225904 in recv () from /lib/libc.so.6
#2  0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826
#3  0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314
#4  0x0805ac3d in purge_ipsec_spi (dst0=0x81b1080, proto=3, spi=0x8188140, n=1)
   at isakmp_inf.c:1173
#5  0x0805ba5c in isakmp_info_recv (iph1=0x81c1e00, msg0=0x1)
   at isakmp_inf.c:565
#6  0x0804ec49 in isakmp_main (msg=0x8218240, remote=0xbfbfe7f0,
   local=0xbfbfe770) at isakmp.c:671
#7  0x0805003e in isakmp_handler (so_isakmp=24) at isakmp.c:395
#8  0x0804bf88 in session () at session.c:223
#9  0x0804b901 in main (ac=0, av=0xbfbfee4c) at main.c:264

I found this email which refers to the exact same issue I am running
into. http://mail-index.netbsd.org/tech-net/2003/09/11/0015.html

The index to the thread is here. Subject "Reminder that we are
supporting two parallel IPsec".
http://mail-index.netbsd.org/tech-net/2003/09/

It looks like a feud between netbsd developers. And from the it appears
as though netbsd and freebsd share the same pfkey interface issue.

What follow is a political discussion on the list about right and wrong.
 And people get flak for choosing something to work around the pfkey
issue. I think this post gives a really good summary of the problem.
http://mail-index.netbsd.org/tech-net/2003/09/12/0036.html

Further down a thread starts with the subject "Problems with PF_KEY
SADB_DUMP". This thread begins with a thorough summary of the issues.
http://mail-index.netbsd.org/tech-net/2003/09/19/0001.html

Interestingly though I find this text:

<--
* There is a genuine bug in the KAME PF_KEY, which  has also been
  faithfully copied in fast-ipsec (NetBSD  and FreeBSD): if a process
  requesting an SADB_DUMP and the kernel fills the requesting so_rcv queue,
  KAME fails to place an error indication in the last-delivered packet.
  (that's why racoon hangs in sbwait(): it is waiting to read another
SADB_DUMP message).

  KAME setkey has a kludge to avoid the bug: it does a setsockopt()
  with SO_RCVTIMEO, and in the loop to read subsequent SADB_DUMP respsones,
  setkey interpretes a subsequent EAGAIN as a sign to abort the loop.
  IMNSO, that's not up to the standards to which NetBSD code aspires.

  A more correct fix is to have the sendup code check whether additional
  SADB_DUMP messages are required; if more are required, and there
  isn't space for at least one more (in addition to the current
  message) then set sadb_msg_errno to (e.g.)  ENOBUFS, to indicate
  the SADB_DUMP responses are truncated at that message.
-->

>How-To-Repeat:
Install ipsec-tools.  Setup with a large number of tunnels.  In this case we are up to 85 tunnels.
>Fix:
No known fix as of yet.  Need to kill ipsec-tools and restart to get it working again.

>Release-Note:
>Audit-Trail:
>Unformatted: