Help with determining a system hang
mahan at mahan.org
Sun Nov 28 16:55:04 UTC 2010
I am running a FreeBSD 8.0 kernel with my code in the kernel that does
some deep packet diving. This is mostly working, but I am having occasional
system hangs. No response to the console keyboard, stops receiving packets,
I have enabled INVARIANTS, WITNESS and WATCHDOG. The watchdog fires (though not
always after the 20 sec wait, sometimes it fires immediately). I also have
DDB and KDB enabled in the kernel.
What is puzzling me is when the watchdog fires and I get the DDB prompt, the
first thing I do is list all cpus: 'show allpcpus'. I would expect to see
one of the CPUs having something happening, but most of the time all I see
is that all of the CPU's are idle. The couple of times this was not true
the CPU showed it was in "em_handle_que" in dev/e1000/if_em.c. But this code
is pretty straight forward, though I could see if it would block on reading
Can anyone give me a suggestion on possible causes? At first I thought that
maybe I was having a deadlock issue with my code, but while WITNESS does report
a few lock-order reversals, they are not in my code and seem to be false
positives. I next looked for some type of resource wait, but cannot find
one (or I don't know how to find it).
'show locks' does not show any locks being held.
'show threads' shows almost every thread sitting in an idle state.
I am at a loss to explain it. I know it is probably my code that is causing this
behavior in some way because I never seen the hang when my code is bypassed.
When I do the packet diving, I am getting called in either ip_input() or
ip_output() directly. In ip_input() I get called either in the forwarding path
or just before calling the upper protocol layer via the protosw.
In ip_output(), I get called just before ip_output() deals with IP fragmentation.
This is a Intel Xeon that FreeBSD reports as a 8 CPUS (duo core + 4
threads/core). However, I am more experienced in MIPS hardware than
Intel. I have not yet dug into the interrupt handling for the Intel in
FreeBSD, but it is one of my suspects since the system is not even responding
to the console keyboard.
This is going to be a learning experience for me :-)
Thanks for any and all help,
More information about the freebsd-hackers