Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet
max at love2party.net
Sat Mar 15 14:30:18 PDT 2008
On Saturday 15 March 2008, Robert Watson wrote:
> On Fri, 14 Mar 2008, Alex Popa wrote:
> > World was cvsupped on March 6th, around 18:00 GMT.
> > Built and installed kernel + world, with options WITNESS and
> > WITNESS_SKIPSPIN.
> > Short background: 7.0-RELEASE had excellent performance on the
> > machine, but it would randomly lock up after some hours (usually over
> > 10 hours). The lockups were hard, meaning nothing seemed to work
> > (NumLock didn't toggle the keyboard LED, no replies to ping, no disk
> > activity). We changed the motherboard and RAM and had the same
> > behaviour. 6.2-REL is rock solid on this machine (had over 50 days
> > uptime), but upgrading to 6.3-REL made it lock up just like 7.0 (so
> > we put 6.2 back and accepted the lower performance for the time
> > being).
> > The LOR messages from dmesg of 7.0-STABLE are as follows:
> > lock order reversal:
> > 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @
> > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729 2nd
> > 0xffffff00042ea0f0 radix node head (radix node head) @
> > /usr/src/sys/net/route.c:147
I haven't seen this one before, can you obtain the trace for this, please?
You might need KDB & DDB for that - not sure.
> > lock order reversal:
> > 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook
> > read/write mutex) @ /usr/src/sys/net/pfil.c:73 2nd 0xffffffff80938c48
> > tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400
This one is most certainly harmless and can be ignored. It is caused by
user/group rules, but a LOR with the read instance of a rwlock will not
lead to a deadlock.
> Dear Alex,
> Thanks for this report, and sorry about the problem. It could well be
> that the lock order warning from WITNESS is related to the hang, and
> might reflect a recursion-related bug in the pf policy routing code.
> I'm not sure to what extent you can tolerate further downtime, but it
> would be useful to gather some more information about the hang itself
> to try and confirm the involvement of lock order. In particular, if
> it's feasible, it would be very helpful if you could boot back to the
> 7-STABLE kernel (keeping the 6.2-STABLE userspace should be fine, I
you'll need at least a new pfctl, because the ioctl interface to /dev/pf
> think), and when the hang occurs, use the console debuggger (ideally
> hooked up to serial or firewire) to run the following debugging
> show pcpu
> show allpcpu
> show allocks
> show witness
> show lockedvnods
> show uma
> show malloc
> A shot-in-the-dark guess is that something about pf's interactions with
> the protocol stack is involved here, but unfortunately I suspect we'll
> need some more information to track it down.
> Also, could you confirm if you're using any credential-related firewall
> rules with either ipfw or pf? These would be uid/gid/jail matching
> Robert N M Watson
> Computer Laboratory
> University of Cambridge
> > More details about the machine in the attached dmesg. It's a SMP
> > with 4GB of RAM, 3 gigabit cards (em0, em1 and, depending on the
> > motherboard we used, either bge0 or msk0). Only em0 is linked to a
> > gigabit port, the others are 100Mbits/s
> > My setup has in-kernel IPFIREWALL, IPFIREWALL_VERBOSE,
> > IPFIREWALL_DEFAULT_TO_ACCEPT, DUMMYNET. I have commented out INET6,
> > SCTP and the wireless interfaces. WITNESS and WITNESS_SKIPSPIN were
> > only added in the hope of figuring out what locks it up, and they did
> > signal these 2 LORs.
> > pf and pflog are loaded as modules (pf_enable and pflog_enable set to
> > yes in rc.conf).
> > - The ipfw/dummynet side:
> > I use net.link.ether.ipfw = 1 for MAC address checking, ipfw +
> > dummynet for traffic shaping (4 queues at 95Mbits/s for the 2
> > external interfaces in/out, and 4 more queues for traffic that goes
> > outside the AS group for which we have fast access). Deciding which
> > queue traffic goes in depends on its source address and whether its
> > destination is in ipfw tables 1, 2 or none. These tables are
> > synchronized from pf tables via a custom script in crontab, which
> > runs every 3 minutes. The pf tables used as source for these are
> > controlled by OpenBGPD.
> > - The pf side:
> > Filtering is done here, as is policy routing. Filtering also
> > contains redirecting to a transparent squid proxy of traffic destined
> > to port 80 but not bound for networks received via BGP and saved to
> > tables <metro> and <special>. Metro and special port 80 traffic goes
> > directly to the destination server.
> > Traffic from net1 and net2 is routed via the "other" external
> > interface, which doesn't contain the default route... with the
> > exception of traffic to pf table <special> (from BGP, same as table 2
> > in ipfw). Traffic to <special> is routed via fastroute in pf
> > (meaning using the default route).
That's quite a complex setup. It would really be interesting to get the
trace for the first LOR in order to figure out which code path we are
looking at. I have a feeling that it might be the dummynet entry point,
but w/o the trace this is only speculation.
> > Attached are full dmesg and the kernel config.
> > I still have access to the hard drive with 7.0-STABLE on it, but not
> > the motherboard/CPU and the network cards... they are running off the
> > hard drive with 6.2 on it.
/"\ Best regards, | mlaier at freebsd.org
\ / Max Laier | ICQ #67774661
X http://pf4freebsd.love2party.net/ | mlaier at EFnet
/ \ ASCII Ribbon Campaign | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 187 bytes
Desc: This is a digitally signed message part.
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080315/b1e98149/attachment.pgp
More information about the freebsd-stable