MP watchdog (or: I have a dual-xeon with processors to burn)
Robert Watson
rwatson at FreeBSD.org
Sun Aug 15 12:48:06 PDT 2004
I've just committed a hack I've been using over the last day or two to
debug hangs. It's hardly perfect, but it is sort of neat. Basically, it
allows you to allocate a CPU on an SMP system as a watchdog to kick you
into the debugger if there's a hang, even if it's spinning in sched_lock
or the like. It can either fire an NMI at the boot processor, or invoke
the debugger directly. I've included a sample "be nasty" sysctl that
attempts to cause a nasty hang which the debugger is capable of breaking
into. Note that the current SMP hang I'm experiencing resists this
technique, but it's a useful one regardless, and is a decent substitute
for having an NMI button. And it's a useful use for that fourth logical
processor on a dual Xeon... :-)
You can add MP_WATCHDOG to your i386 conf file, select SCHED_4BSD as the
scheduler, and use the debug.watchdog sysctl to set a debugging CPU (I'll
usually set it to 3 on my box). In ps(1) you'll see the idle thread on
that CPU rename to a watchdog thread. Due to interrupt round-robining and
some IPI's, there will be situations where the watchdog CPU does other
things than watch, but it seems to do that in few enough situations that
this is useful for a broad range of debugging. Obviously, you lose
utilization of the CPU for the duration of having the watchdog enabled.
Note: This does not work with sched_ule, only sched_4bsd. I'll work on
fixing that at some point, but I'm still chasing the current stability
problems.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Principal Research Scientist, McAfee Research
---------- Forwarded message ----------
Date: Sun, 15 Aug 2004 18:02:10 +0000 (UTC)
From: Robert Watson <rwatson at FreeBSD.org>
To: src-committers at FreeBSD.org, cvs-src at FreeBSD.org, cvs-all at FreeBSD.org
Subject: cvs commit: src/sys/conf files.i386 options.i386 src/sys/i386/i386 mp_machdep.c mp_watchdog.c src/sys/i386/include mp_watchdog.h
rwatson 2004-08-15 18:02:10 UTC
FreeBSD src repository
Modified files:
sys/conf files.i386 options.i386
sys/i386/i386 mp_machdep.c
Added files:
sys/i386/i386 mp_watchdog.c
sys/i386/include mp_watchdog.h
Log:
Add an "options MP_WATCHDOG" to i386. This option allows one of the
logical CPUs on a system to be used as a dedicated watchdog to cause a
drop to the debugger and/or generate an NMI to the boot processor if
the kernel ceases to respond. A sysctl enables the watchdog running
out of the processor's idle thread; a callout is launched to reset a
timer in the watchdog. If the callout fails to reset the timer for ten
seconds, the watchdog will fire. The sysctl allows you to select which
CPU will run the watchdog.
A sample "debug.leak_schedlock" is included, which causes a sysctl to
spin holding sched_lock in order to trigger the watchdog. On my Xeons,
the watchdog is able to detect this failure mode and break into the
debugger, which cannot otherwise be done without an NMI button.
This option does not currently work with sched_ule due to ule's push
notion of scheduling, similar to machdep.hlt_logical_cpus failing to
work with that scheduler.
On face value, this might seem somewhat inefficient, but there are a
lot of dual-processor Xeons with HTT around, so using one as a watchdog
for testing is not as inefficient as one might fear.
Revision Changes Path
1.503 +1 -0 src/sys/conf/files.i386
1.213 +1 -0 src/sys/conf/options.i386
1.234 +9 -0 src/sys/i386/i386/mp_machdep.c
1.1 +225 -0 src/sys/i386/i386/mp_watchdog.c (new)
1.1 +34 -0 src/sys/i386/include/mp_watchdog.h (new)
More information about the freebsd-current
mailing list