CURRENT freezes on Laitude D520
Tai-hwa Liang
avatar at mmlab.cse.yzu.edu.tw
Sat Dec 9 19:11:42 PST 2006
On Sat, 9 Dec 2006, Robert Watson wrote:
[...]
> Right now, setting debug.mpsafenet=1 has three effects:
>
> (1) Place Giant over the network stack, creating a single lock that spans the
> entire stack, preventing parallelism, as well as acting as a "master"
> lock
> which implicitly prevents lock order-related deadlocks in the stack.
>
> (2) Effectively disabling preemption in the network stack, as ithreads and
> the
> netisr will be unable to start running until user threads exit the stack,
> regardless of priority.
>
> (3) Effectively disable direct dispatch, as non-MPSAFE netisr handlers are
> always deferred rather than executing in the ithread context.
>
> I suspect that many of the people setting debug.mpsafenet=1 and declaring the
> problem fixed are seeing the change due to (2) and (3), indirect rather than
> direct effects of (1). I would much rather people experimented with:
>
> - Disabling direct dispatch (net.isr.direct=0)
>
> - Disabling preemption (compiling out options PREEMPTION)
>
> - Running with WITNESS, which reports lock order reversals.
>
> which get a bit more to the heart of most problems. debug.mpsafenet=1 really
> exists for the purposes of supporting components which are not sufficiently
> locked to allow the stack to run MPSAFE, rather than as a means of disabling
> direct dispatch and preemption, which speak to different types of problems.
> The main reason that I haven't removed the administrator tunable to date is
> that I suspect it will be quite helpful when KAME IPSEC locking happens, but
> since that appears not to have happened yet, debug.mpsafenet as an option is
> likely causing more harm than good by being available as a stand-in sysctl
> masking other problems, causing people to not get to the point of properly
> identifying the actual cause (device driver bugs, etc).
Can the aforementioned tricks(1/2/3) being applied to RELENG_6 as well?
We are using RELENG_6 as our production server(postfix, squid,
pf firewall/NAT, FAST_IPSEC VPN, ...), which is a dual Athlon MP board
with three NICs(two fxp cards and one onboard xl, connected to three
different networks).
I haven't try WITNESS, yet; however, I'm very sure that net.isr.direct=0
plus that there is no PREEMPTION in current kernel. The problem is that,
with debug.mpsafenet=1, we'll always run into hard freeze w/o having any
kdb> prompt on console.
Whilst turning debug.mpsafenet off only masks the real problem, I'm still
wondering about if there is any less damaging way to track such problem
down in a _production_ environment.
--
Thanks,
Tai-hwa Liang
More information about the freebsd-current
mailing list