watchdog: hw+sw?

Andriy Gapon avg at icyb.net.ua
Tue Apr 7 05:37:30 PDT 2009


I've been thinking about this some more. So, clearly, sw watchdog is different
from all the hw watchdogs (that I know about) in that it tries to take a debugging
action as opposed to a straightforward recovery action. As such it currently
doesn't make much sense to mix sw and hw watchdogs together, because in the case
of a problem they would fire at close times and a (typical) hw watchdog would
override sw watchdog.
This is fine as it is, maybe a small warning in a case of such mix would be nice too.

However, I think that it should be possible to use sw watchdog as a special
"primary" watchdog and hw watchdog(s) as "failsafe" watchdogs for the primary one.
I see two general approaches at the moment:
1. hw watchdog has only "slightly" longer timeout than the sw watchdog (by a
configurable delta), the watchdogs gets patted at the same time; if the sw wd
fires and is able to proceed, it first disables hw watchdog(s) and the performs
its duty (panic, ddb);
2. hw watchdog has "substantially" longer timeout that the sw watchdog (by a
configurable delta), the watchdogs gets patted at the same time; if the sw wd
fires it has a limited amount of time to do its action before the hw wd fires too;
in this case it would also be nice to have a short ddb command for stopping hw
watchdog.

Each approach has its own advantages and disadvantages.
The first approach guarantees that sw wd would not be interrupted by hw wd. On the
other hand, there is no protection e.g. from a system getting stuck during a dump.
Also, hw watchdogs would have to provide a method for "emergency stop" that should
be safe from locking issues.
The second approach is more robust. Its problems: (a) it can interrupt sw wd
action too early; (b) it wastes more time if sw wd is not able to fire.

Since using sw and hw watchdogs together makes more sense in unattended scenarios,
I think that approach #2 may be better. IMO, attended scenarios should use sw wd
exclusively.

-- 
Andriy Gapon


More information about the freebsd-hackers mailing list