Kernel panic with PF

Max Laier max at love2party.net
Fri Jul 21 00:05:55 UTC 2006


[CC'ing -pf]

On Thursday 20 July 2006 17:53, Michal Mertl wrote:
> Hello,
>
> I am deploying FreeBSD based application proxies' based firewall
> (www.kernun.com, but not much English there) and am having frequent
> panics of RELENG_6_1 under load. The server has IP forwarding disabled.
>
> I've got two machines in a carp cluster and the transparent proxies use
> PF to get the data.

Which proxies are you using?  The "pool_ticket: 1429 != 1430" messages you 
quote below indicate a synchronization problem within the app talking to pf 
via ioctl's.  Tickets are used to ensure atomic commits for operations that 
require more than one ioctl.  If your proxy app runs in parallel it might 
screw up the internal state and thus leave it undefined afterwards.  I give 
you that this shouldn't cause a kernel problem, but if we could fix the app 
we can probably find the right sanity check more easily.

> I don't know much about kernel internals and PF but from the following
> backtrace I understand that the crash happens because rpool->cur on line
> 2158 in src/sys/contrib/pf/net/pf.c is NULL and is dereferenced. It
> probably shouldn't happen yet it does.
>
> The machines are SMP and were running SMP kernel. The only places where
> pool.cur (or pool->cur) is assigned to are in pf_ioctl.c. It seems there
> are some lock operations though so it is probably believed that the
> coder is properly locked.
>
> I have been running with kern.smp.disabled=1 for a moment before I put
> the old firewall in place and haven't seen the panic but the time was
> deffinitely too short to make me believe it fixes the issue. Can setting
> debug.mpsafenet to 0 possibly also help?
>
> I could probably bandaid this particular failure mode by returning
> failure instead of panicing but the bug is probably elsewhere.
>
> I've lost the debug kernel from which this backtrace is and can't
> therefore continue much :-(. Unfortunately so far I can only reproduce
> the problem in production and for obvious reasons I can't put it there.
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x28
> fault code              = supervisor read, page not present
> instruction pointer     = 0x8:0xffffffff801ab528
> stack pointer           = 0x10:0xffffffffb1ade650
> frame pointer           = 0x10:0xffffff004cc7cc30
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 15 (swi1: net)
> trap number             = 12
> panic: page fault
>
> #0  doadump () at pcpu.h:172
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff803d5137 in boot (howto=260)
>     at ../../../kern/kern_shutdown.c:402
> #3  0xffffffff803d58a1 in panic (fmt=0xffffff007ba32000 "@\223<A3>{")
>     at ../../../kern/kern_shutdown.c:558
> #4  0xffffffff80543b3f in trap_fatal (frame=0xffffff007ba32000,
>     eva=18446742976272241472) at ../../../amd64/amd64/trap.c:660
> #5  0xffffffff80543e5f in trap_pfault (frame=0xffffffffb1ade5a0,
> usermode=0)
>     at ../../../amd64/amd64/trap.c:573
> #6  0xffffffff80544113 in trap (frame=
>       {tf_rdi = 2, tf_rsi = -1098223465792, tf_rdx = -1098439497700,
> tf_rcx = -1
> 314002464, tf_r8 = 0, tf_r9 = -1314002776, tf_rax = 0, tf_rbx = 0,
> tf_rbp = -109
> 8223465424, tf_r10 = 1, tf_r11 = 257, tf_r12 = -1098439497700, tf_r13 =
> -1314002
> 776, tf_r14 = 2, tf_r15 = -1314002464, tf_trapno = 12, tf_addr = 40,
> tf_flags =
> 216171684640539392, tf_err = 0, tf_rip = -2145733336, tf_cs = 8,
> tf_rflags = 661
> 18, tf_rsp = -1314003360, tf_ss = 16})
> at ../../../amd64/amd64/trap.c:352
> #7  0xffffffff8052feab in calltrap ()
> at ../../../amd64/amd64/exception.S:168
> #8  0xffffffff801ab528 in pf_map_addr (af=2 '\002',
> r=0xffffff004cc7cac0,
>     saddr=0xffffff003fe7681c, naddr=0xffffffffb1ade9e0, init_addr=0x0,
>     sn=0xffffffffb1ade8a8) at ../../../contrib/pf/net/pf.c:2163
> #9  0xffffffff801acab6 in pf_get_translation (pd=0xffffffffb1ade9c0,
>     m=0xffffff0042ede900, off=20, direction=1, kif=0xffffff007b038a00,
>     sn=0xffffffffb1ade8a8, saddr=0xffffff003fe7681c, sport=0,
>     daddr=0xffffff003fe76820, dport=50881, naddr=0xffffffffb1ade9e0,
>     nport=0xffffffffb1ade8b6) at ../../../contrib/pf/net/pf.c:2618
> #10 0xffffffff801b315b in pf_test_tcp (rm=0xffffffffb1ade960,
>     sm=0xffffffffb1ade950, direction=1, kif=0xffffff007b038a00,
>     m=0xffffff0042ede900, off=20, h=0xffffff003fe76810,
>     pd=0xffffffffb1ade9c0, am=0xffffffffb1ade968,
> rsm=0xffffffffb1ade970,
>     ifq=0x2, inp=0x0) at ../../../contrib/pf/net/pf.c:3013
> #11 0xffffffff801b5694 in pf_test (dir=1, ifp=0xffffff0000bee800,
>     m0=0xffffffffb1adeaa0, eh=0xffffffffb1ade97e, inp=0x0)
>     at ../../../contrib/pf/net/pf.c:6449
> #12 0xffffffff801bafb2 in pf_check_in (arg=0x2, m=0xffffffffb1adeaa0,
>     ifp=0xffffff004cc7cac0, dir=-1314002464, inp=0xffffffffb1ade9e0)
>     at ../../../contrib/pf/net/pf_ioctl.c:3358
> #13 0xffffffff80461c2e in pfil_run_hooks (ph=0xffffffff807e0920,
>     mp=0xffffffffb1adeb28, ifp=0xffffff0000bee800, dir=1, inp=0x0)
>     at ../../../net/pfil.c:139
> #14 0xffffffff8048d225 in ip_input (m=0xffffff0042ede900)
>     at ../../../netinet/ip_input.c:465
> #15 0xffffffff8046180c in netisr_processqueue (ni=0xffffffff807df690)
>     at ../../../net/netisr.c:236
> #16 0xffffffff80461abd in swi_net (dummy=0x2)
> at ../../../net/netisr.c:349
> #17 0xffffffff803bbd99 in ithread_loop (arg=0xffffff00000506a0)
>     at ../../../kern/kern_intr.c:684
> #18 0xffffffff803ba527 in fork_exit (
>     callout=0xffffffff803bbc50 <ithread_loop>, arg=0xffffff00000506a0,
>     frame=0xffffffffb1adec50) at ../../../kern/kern_fork.c:805
> #19 0xffffffff8053020e in fork_trampoline ()
>     at ../../../amd64/amd64/exception.S:394
> #20 0x0000000000000000 in ?? ()
>
> The firewall also reports lots of PF problems durings operation:
>
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: KERN-100-E
> [natutil.c:770] ioctl(): Invalid argument (EINVAL=22)
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-111-E
> add_rule(): PF ioctl DIOCADDRULE failed
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-701-E
> addnatmap out(): Adding TCP NAT MAP from [127.0.0.1]:60860 to
> [212.80.76.13]:80 -> [193.179.161.10]:60860 failed
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NETL-210-E
> netbind(server,10): NAT binding failed
>
> Kernel often reports "pool_ticket: 1429 != 1430" (with increasing
> numbers over time).
>
> Thank you very much for any advice.

-- 
/"\  Best regards,                      | mlaier at freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-pf/attachments/20060721/4f42827a/attachment.pgp


More information about the freebsd-pf mailing list