request for preliminary review, enhanced watchdog.

Tue Feb 19 22:26:34 UTC 2013

on 13/02/2013 03:17 Alfred Perlstein said the following:
> At work we've had some issues with superfluous watchdog timeouts firing.
> 
> Since we use an ipmi/external watchdog the system is completely reset and we are
> unable to gather metrics.
> 
> I investigated the issue and then compared to what is offered by Linux and
> decided to crib from their API such that we can benefit from an enhanced watchdog.
> 
> I have a WIP at this time in a branch that I would hope people could weigh in on
> and review as well as make technical suggestions.

Alfred,

I think that this is very useful work.
Some comments below.

> The branch is located here:
>   svn+ssh://svn.freebsd.org/base/user/alfred/ewatchdog
> 
> The easy way to get changes:
>   svn log --stop-on-copy svn+ssh://svn.freebsd.org/base/user/alfred/ewatchdog
> 
> 1) Support for pre-watchdog timeout.  This means that so long as the kernel is
> somewhat functional (callouts are working) we can trigger a configurable action
> (panic,ddb,log) if the watchdog program is otherwise hung.

I see where this can be useful.
The unfortunate drawback which you mentioned is that the solution is
"semi-reliable" - it won't help much if a hang is such that the callouts no
longer fire.
But it could be still desirable to obtain something for postmortem analysis even
in that condition.

> 2) Support for built-in software watchdog that has the same options
> (panic,ddb,log) if the watchdog times out.  This is useful for prototyping and
> was done instead of using the SW_WATCHDOG in kern_clock.c because of the ease of
> working the code into watchdog.c versus communication via the EVENTHANDLER api.

I see why you chose (or had to choose) this option, but this is kind of
unfortunate - more below.

> 3) Support for Linux-like API. (WDIOC_GETTIMELEFT,
> WDIOC_SETTIMEOUT,WDIOC_GETTIMEOUT, etc)

I haven't looked at the complete Linux API, but from you quote above - what are
the Linux and potential FreeBSD use-cases for the ioctls like GETTIMELEFT and
GETTIMEOUT?

> 4) Modifications to watchdogd(8):
>    - Warn if the watchdog program takes too long.
>    - Disable activation of the system watchdog so that one can test the
> watchdogd script
>      without potentially rebooting the system.
>    - Ability to log to syslog when scripts begin to timeout.
>    - When told to measure time, do not unconditionally nap for 'sleep' seconds,
> instead adjust
>      the naptime by the elapsed time so as not to trigger the watchdog.

I don't have anything to say about the userland part.  In general these new
things sound useful.

> I've not yet hooked in the optional pre-timeout code into watchdogd(8) but plan
> on doing so later in the week.
> 
> It would be really helpful if we could decide on a way of selecting which
> watchdogs to arm/fire and how to query them.  I may adopt the Linux API unless
> someone has alternative suggestions that make a strong enough case to forge our
> own API.

Again, I haven't examined Linux API, so I can't say much about it.
The following is how I imagine our watchdog infrastructure.

I think that we should have some quality and feature flags associated with
various watchdog drivers (somewhat similarly to e.g. eventtimers), which would
describe things like:
- I am implemented in software or hardware
- I am able to generate system reset
- I am able to generate a "hard" debug event (NMI)
- (for software wd) I work via NMIs or regular interrupts

Then ,I think that watchdogd should support at least two timeouts: for debug
watchdog and reset watchdog.  The ioctl interface should of course support
setting timeouts per watchdog type.
This way a user should be able to specify a timeout (e.g. 10 seconds) for a
debug watchdog with an intent of dumping a core (or other debugging action) and
a different timeout (say 60 seconds) for a reset watchdog, which should make
sure in a fail-safe manner that a system doesn't get stuck in the debug/dump/etc
code.

Then, the kernel should auto-select the best watchdog driver for each of the
watchdog classes.  But sysctl interface should allow a user to override the
selection in case that there are multiple drivers with sufficient capabilities.

Also, and only partially related to your WIP, I think that it is long overdue
that we got a software watchdog driven by (periodic) NMIs as opposed to
SW_WATCHDOG (or your "callout" "watchdog" [in quotes only because it is not
implemented as a real watchdog(9) driver, but is blended into the
infrastructure]) that is driven by regular timer interrupts.

My opinion is that such infrastructure could be more powerful and flexible (and
reliable) than what you currently have in the branch.  We could let a multitude
of watchdog drivers co-exist and "cooperate" by ensuring that each of them does
its special part of the overall job.  Of course, it requires more work too.

-- 
Andriy Gapon