Racoon(ipsec-tools) enters sbwait state or 100% CPU utilization quite often on RELENG_1_2

VANHULLEBUS Yvan vanhu_bsd at zeninc.net
Sat Aug 18 03:28:07 PDT 2007

On Fri, Aug 17, 2007 at 04:53:56PM -0400, Scott Ullrich wrote:
> Hello!


> We are trying to track down a problem that involves a large number of
> ipsec tunnels (in this case 80).  Frequently racoon (ipsec-tools
> 0.7rc1 and also 0.6) will deadlock into the sbwait state or will enter
> a 100% cpu usage state and will not recover without killing the
> process and restarting.
> #0  0x2827a187 in recvfrom () from /lib/libc.so.6
> #1  0x28225904 in recv () from /lib/libc.so.6
> #2  0x0805f4f5 in pk_recv (so=11, lenp=0xbfbfe558) at pfkey.c:2826
> #3  0x0805f622 in pfkey_dump_sadb (satype=3) at pfkey.c:314
> Does anyone know what we can look at further to try and eliminate the
> problem or does anyone have suggestions on how we can debug further?

It really looks like an old "known" (well, at least known by me...)
problem with PFKey interface: it is quite impossible to set up more
than 50-100 tunnels on a standard FreeBSD (and probably any other KAME
based stack), because some kind of socket related problems will happen
when racoon will try to get the SPD or the SADB entries.

When the problem occurs withe the SPD, racoon won't be able to
negociate some tunnels (because it doesn't have the SPD entries in
it's own table), when the problems occurs with the SADB, it can lead
to the 100% CPU usage you have....

Some workarounds are possible depending on your configuration, you may
be able to reduce the number of used SAs (merge some phases2 with
contiguous subnets, use REQUIRE instead of UNIQUE for some tunnels,
etc...), but if you have 80 peers with each one only ONE phase2,
that's another problem....

To solve that problem, the only solution we found is to do a big PFKey
hack, to have only one request/response, and all the SPD/SAD entries
exchanged via a single buffer shared by kernel and racoon.

I also know an old bug in sbspace macro (found in FreeBSD 4.x), but it
seems it has been fixed at least in FreeBSD 6.



More information about the freebsd-net mailing list