LORs with ipfw
Robert Watson
rwatson at freebsd.org
Wed Jul 7 20:47:31 PDT 2004
On Wed, 7 Jul 2004, Wiktor Niesiobedzki wrote:
> lock order reversal
> 1st 0xc07287c8 IPFW static rules (IPFW static rules) @ /usr/src/sys/netinet/ip_fw2.c:1828
> 2nd 0xc065cfcc tcp (tcp) @ /usr/src/sys/netinet/ip_fw2.c:1574
> Stack backtrace:
> backtrace(c05ec5a7,c065cfcc,c05ec12e,c05ec12e,c0726a3c) at backtrace+0x17
> witness_checkorder(c065cfcc,9,c0726a3c,626,806) at witness_checkorder+0x678
> _mtx_lock_flags(c065cfcc,0,c0726a3c,626,0) at _mtx_lock_flags+0x80
> check_uidgid(c15610a4,6,0,e08d1f53,1bd) at check_uidgid+0xd3
> ipfw_chk(cb9b6bf4,cb9b6c48,c1189014,1,0) at ipfw_chk+0x9e2
> ip_input(c1395c00,0,c071c576,1d0,0) at ip_input+0x375
> transmit_event(c1510c00,0,c071c576,300,2) at transmit_event+0x14b
> dummynet(0,0,c05ea27a,f6,1) at dummynet+0x1a9
> softclock(0,0,c05e6b67,263,c0631d40) at softclock+0x1aa
> ithread_loop(c10dd500,cb9b6d48,c05e695e,327,c10dd500) at ithread_loop+0x172
> fork_exit(c04a5b80,c10dd500,cb9b6d48) at fork_exit+0xbc
> fork_trampoline() at fork_trampoline+0x8
>
> This is from yesterdays CURRENT. I have compiled kernel with
> CPUTYPE=athlon-xp and CFLAGS=-O2. Currently I'm not able to reproduce
> this messages with CPUTYPE=i686 and empty CFLAGS.
>
> Does anyone has an clue, where the problem may lie here (or is it just
> harmless?)
This is a warning about a potentially harmful, but somewhat harder to fix
issue. Basically, we currently have what amounts to a subsystem or giant
lock over the ipfw rule set and its evaluation. Normally, the ipfw lock
will fall "after" most other locks, including protocol control block (pcb)
locks, as it will be called from other protocol code during processing.
However, when using a uid/gid rule, the protocol control block for the
packet is looked up by the ipfw code, which acquires pcb locks after the
ipfw lock. There are a few things to think about here:
(1) This lock order reversal is really a result of a layering violation --
the ipfw code is acting on packets at the IP layer, and looking up the
connection from the IP layer results in cross-layer transitions that
don't fit the general model.
(2) The lock order reversal occurs in a situation where a race condition
also occurs -- the pcb may actually be looked up twice for inbound
packets, once in ipfw, and then again for delivery. While it's
somewhat unlikely, the pcb could change in that window. The window is
stretched out through the use of functionality like dummynet.
(3) One way to think about fixing this is to avoid the need to hold the
ipfw lock across the entire execution of ipfw. I've been thinking
about reference-counting the rule set, such that each instance of a
thread entering the ipfw code sees the rule set as read-only and can
access it lock-free once it has acquired a reference, releasing the
reference on exit. For long rule sets, this would help reduce
contention. You can imagine various variations on the model, such as
per-cpu rule set instances, etc. There are some interesting challengs
in dynamic state management, however.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Principal Research Scientist, McAfee Research
More information about the freebsd-current
mailing list