Kernel panic with PF

Fri Jul 21 09:05:50 UTC 2006

Daniel Hartmeier wrote:
> On Fri, Jul 21, 2006 at 02:05:45AM +0200, Max Laier wrote:
> 
> > Which proxies are you using?  The "pool_ticket: 1429 != 1430" messages you 
> > quote below indicate a synchronization problem within the app talking to pf 
> > via ioctl's.  Tickets are used to ensure atomic commits for operations that 
> > require more than one ioctl.  If your proxy app runs in parallel it might 
> > screw up the internal state and thus leave it undefined afterwards.  I give 
> > you that this shouldn't cause a kernel problem, but if we could fix the app 
> > we can probably find the right sanity check more easily.
> 
> This looks like a bug in pf_ioctl.c pfioctl() DIOCCHANGERULE
> 
>                         if (((((newrule->action == PF_NAT) ||
>                             (newrule->action == PF_RDR) ||
>                             (newrule->action == PF_BINAT) ||
>                             (newrule->rt > PF_FASTROUTE)) &&
> -                           !pcr->anchor[0])) &&
> +                           !newrule->anchor)) &&
>                             (TAILQ_FIRST(&newrule->rpool.list) == NULL))
>                                 error = EINVAL;
> 
> i.e. the pool must not be empty for routing and translation rules,
> except for translation rules that are actually anchor _calls_.
> 
> The confusion is between translation rules within anchors
> (pcr->anchor[0] != '\0') and calls to anchors' translation rules
> (rule->anchor != NULL).
> 
> If the proxy is using DIOCCHANGERULE (it must be the proxy, pfctl isn't
> using it at all), AND is trying to add/update a rule that requires at
> least one replacement address but contains an empty list, then this
> would cause the panic seen when that rule later matches a packet.
> 
> This needs fixing in OpenBSD as well.
> 
> Michal, can you please confirm that the patch above fixes the panic?
> The proxy will still misbehave and cause the log messages (one more
> EINVAL in this case ;), but the kernel shouldn't crash anymore.

I am afraid I can't test it at the moment. I am going to get one of the
machines to my lab and will experiment with it there. I am afraid I will
have problems generating enough traffic for the problem to appear but I
will try.

> Thanks for the excellent bug report!

Thank you. I don't think is was that good as I now see that you had to
guess there are anchors used.

The rules look like this (except the rules seen by 'pfctl -s nat' they
are generated by the proxies when they start):

fw1#pfctl -s rule
fw1#pfctl -s nat
nat-anchor "/kernun/*" all
rdr-anchor "/kernun/*" all
fw1#pfctl -s Anchors -v
  kernun
  kernun/4026
  kernun/4039
  kernun/4088
  kernun/4112
  kernun/4134
  kernun/4164
  kernun/4197
  kernun/4257
  kernun/4296
  kernun/4338
  kernun/4383
  kernun/4431
  kernun/4482
  kernun/4590
  kernun/4649
fw1# pfctl -a kernun/4039 -s nat
rdr on em0 inet proto tcp from any to any port = http label "HTTP" ->
127.0.0.1

When the system was under load I saw ~5000 states in 'pfctl -s state'.

Thank you again. I will let you know when I get a chance to test your
patch and or find out anything new.

Michal