Lock Order Reversal on 7.0-STABLE with pf and ipfw / dummynet

Sun Mar 16 15:37:06 PDT 2008

On Sunday 16 March 2008 21:16:16 Alex Popa wrote:
> This is a mixed reply to both the previous mails, bear with me please.
>
> On Sat, Mar 15, 2008 at 10:16:54PM +0100, Max Laier wrote:
> > On Saturday 15 March 2008, Robert Watson wrote:
> > > On Fri, 14 Mar 2008, Alex Popa wrote:
> > > > [snip]
> > > > The LOR messages from dmesg of 7.0-STABLE are as follows:
> > > >
> > > > lock order reversal:
> > > > 1st 0xffffffffb19e0680 pf task mtx (pf task mtx) @
> > > > /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:6729 2nd
> > > > 0xffffff00042ea0f0 radix node head (radix node head) @
> > > > /usr/src/sys/net/route.c:147
> >
> > I haven't seen this one before, can you obtain the trace for this,
> > please? You might need KDB & DDB for that - not sure.
>
> I'll do my best (see below for my questions about getting a trace).
>
> > > > lock order reversal:
> > > > 1st 0xffffffff80938508 PFil hook read/write mutex (PFil hook
> > > > read/write mutex) @ /usr/src/sys/net/pfil.c:73 2nd 0xffffffff80938c48
> > > > tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:400
> >
> > This one is most certainly harmless and can be ignored.  It is caused by
> > user/group rules, but a LOR with the read instance of a rwlock will not
> > lead to a deadlock.
>
> I'm not using uid/gid/jail rules as far as I remember.  I'll add another
> reply with pf.conf and the script I use to generate and reload the ipfw
> rules (but I'll anonymize them).
>
> > > Dear Alex,
> > >
> > > Thanks for this report, and sorry about the problem.  It could well be
> > > that the lock order warning from WITNESS is related to the hang, and
> > > might reflect a recursion-related bug in the pf policy routing code.
> > > I'm not sure to what extent you can tolerate further downtime, but it
> > > would be useful to gather some more information about the hang itself
> > > to try and confirm the involvement of lock order.  In particular, if
> > > it's feasible, it would be very helpful if you could boot back to the
> > > 7-STABLE kernel (keeping the 6.2-STABLE userspace should be fine, I
> >
> > you'll need at least a new pfctl, because the ioctl interface to /dev/pf
> > has changed.
>
> Switching between 6.2-RELEASE-p7 (not STABLE, because as I said 6.3
> exhibited the lockups too) and 7-STABLE isn't that much of a problem.
> The upgrade path was "buy a new hard drive, set up everything and then
> adapt the old config files"... actually we bought 2 harddrives, and I
> set them up one with amd64 and another with i386.  I think /etc and
> /usr/local/etc are perfectly identical on these 2 (I adapted the configs
> from 6.2 to 7.0, but I just copied them from amd64 to i386).
>
> So, actions needed to switch:  Backup the database on 6.2 (with IP/MAC
> mappings and a bit more), put in the 7.0 hard drive, boot off 7.0,
> restore DB, let it run.  Total downtime should be around 7 minutes tops.
>
> > > think), and when the hang occurs, use the console debuggger (ideally
> > > hooked up to serial or firewire) to run the following debugging
> > > commands:
> > >
> > >    show pcpu
> > >    show allpcpu
> > >    trace
> > >    alltrace
> > >    show allocks
> > >    show witness
> > >    show lockedvnods
> > >    show uma
> > >    show malloc
>
> This is where things get a bit tricky, and I need advice.
>
> As I said, my observation is that the keyboard seems to stop working
> when the lockup occurs, that is, pressing Num Lock won't toggle the
> state of the LED.  Thus I have some doubts that trying the good-old
> Control-Alt-ESC would have the desired effect (dropping me into the
> debugger).  However, I'm not that familiar with the FreeBSD
> architecture, and wouldn't be surprised if the LED toggling would be in
> another thread and the macine will actually respond to the keyboard
> interrupt and drop me into ddb.  Also, judging by the lack of NumLock
> activity (it works fine when the system's up), would serial console or
> firewire be functional during the lockup?

Keyboard LEDs are broken for me on 6.3 amd64 (kbdmux).
I'd double check they work before you rely on this as a diagnostic tool.

>
> Also, a bit of explanations:
>
> Why I'm asking the above:  The current motherboard has a serial port
> (and it works, we've used it), but not a firewire port.  The other
> motherboard we tried has firewire, but no serial.  As a console
> workstation, I can get a few with serials, but not so easy with
> firewire.  The null modem cable might be a problem too, depending on
> length.
>
> Also, since the lockup isn't easily reproducible, I'll probably need to
> spend some hours on location and if I'm going to do that, I'd like a
> degree of hope that either keyboard, serial console or firewire will
> work.  Also, firewire will require me to switch motherboards, but that
> can be done together with the hard drive swapping, during the night.
>
> After a bit of studying NOTES, I was wondering if a combination of
> serial console (or just plain console) with "options WITNESS_KDB" would
> help get a "good enough" trace.  The upside of this is that both LORs
> usually occur early (not much later than the login prompt, usually
> earlier) as opposed to after 12...18 hours, and I can either force a
> panic after each and get 2 core dumps, or run the debug commands
> suggested (either as debug LOR1 / continue / debug LOR2, or debug LOR1 /
> reboot / "continue" LOR1 / debug LOR2 - whichever is more appropriate).
>
> For the moment I have both hard drives (7.0-STABLE/amd64 and
> 7.0-RELEASE/i386) and the new motherboard (no serial, but with firewire)
> as a working computer under my desk.  I can prepare for the night-time
> switch and debug by compiling kernel and/or world and doing some
> preliminary testing here.  If I really need to test null modem console,
> I can put the hdd in my own desktop and test with another machine.
>
> > > A shot-in-the-dark guess is that something about pf's interactions with
> > > the protocol stack is involved here, but unfortunately I suspect we'll
> > > need some more information to track it down.
> > >
> > > Also, could you confirm if you're using any credential-related firewall
> > > rules with either ipfw or pf?  These would be uid/gid/jail matching
> > > rules.
>
> As I said above, I don't use any uid/gid/jail rules.  Mail with pf.conf
> and ipfw config incoming shortly after this one.
>
> > > Robert N M Watson
> > > Computer Laboratory
> > > University of Cambridge
>
> [snip]
>
> > That's quite a complex setup.  It would really be interesting to get the
> > trace for the first LOR in order to figure out which code path we are
> > looking at.  I have a feeling that it might be the dummynet entry point,
> > but w/o the trace this is only speculation.
>
> Working on it.
>
> > --
> > /"\  Best regards,                      | mlaier at freebsd.org
> > \ /  Max Laier                          | ICQ #67774661
> >  X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
> > / \  ASCII Ribbon Campaign              | Against HTML Mail and News
>
> I'd like suggestions / comments about the kernel config I'm thinking
> about for debugging purposes:
>
> - take my KERNEL (GENERIC + IPFW - IPv6 and SCTP and wireless), and add:
>
> options		WITNESS
> options		WITNESS_KDB	# only if debug-on-first-warn is wanted
> options		WITNESS_SKIPSPIN
> options		KDB
> #options	KDB_TRACE	# not needed since I'll trace anyway?
> options		DDB
> #options	BREAK_TO_DEBUGGER	# would that work for my kind of lockup?
> options		MSGBUF_SIZE=409600
>
>
> Ideally I would like to hear that the manual tracing and debugging with
> a keyboard console would provide enough info.  I'll increase the kernel
> buffer size to 400k as above, so I don't lose info when I continue and
> dmesg > log.txt.
>
> Just as easily, I can try forcing a panic at the LORs and keeping the
> kernel dumps (with optional debugging in ddb like above).  The advantage
> is that this might andswer supplementary questions after the deed is
> done.
>
> Both the above options should be possible this week.
>
> The serial console part may or may not happen this week, and I'm quite
> positive it will take another week before I find the time to spend 16+
> hours on location, waiting for a lockup (which might happen at a busy
> time and therefore I'll have very little time to do all the debugging).
>
> Tips / suggestions are most welcome!
>
> Thanks for the help!
> 	Alex

-- 
ian j hart