RPi - watchdogd not working anymore (since r273154+)

Ian Lepore ian at FreeBSD.org
Wed Dec 10 15:23:07 UTC 2014


On Wed, 2014-12-10 at 06:29 +0100, Andreas Schwarz wrote:
> On 09.12.14, Ian Lepore wrote:
> 
> Hi Ian,
> 
> > watchdogd requests a timeout of approximately 128 seconds.  It used to
> > pet the dog once per second, and recently it was changed to only do so
> > once every 10 seconds for efficiency.
> >
> > The rpi watchdog hardware is unable to set a timeout longer than 15
> > seconds, but a bug in the driver let the value wrap around then get
> > bitmasked such that the request for a 128 second timeout was actually
> > getting handled as a 9 second timeout.  The 9 second thing worked when
> > we were petting the dog once a second, but now fails at once every 10
> > seconds.
> >
> > I've got a fix for the wrapping problem, but all it will do is make the
> > timer not get set at all if you ask for 128 seconds (that's what the
> > interface for watchdogs requires, don't set the timer if can't be at
> > least as long as requested).
> >
> > To fix your problem you'll need to set watchdogd_flags="-s 4 -t 8" in
> > your rc.conf.  That will make the timeout 8 seconds and pet the dog
> > every 4 seconds.  You don't have too many options for the timeout value
> > (-t) because of the goofy way the timeout is represented in the kernel.
> > The only choices that work on rpi are 1,2,4,8 seconds.  If you ask for 9
> > it gets represented as a value that translates to 17.5 seconds and rpi
> > can't do it.
> 
> Thank you for your copious explanation. I understand the problem and
> was able to run the watchdog again. In general, it's a litte bit unsatisfying, 
> that we have a (limited) watchdog hardware which will not fit the requirements 
> of the freebsd watchdog implementation (which normally should cover all 
> the watchdog hardware).
> 
> best regards,
> Andreas
> 

The hardware is what it is, and the rpi isn't the only modern arm system
with a 16 second max timeout.  I think the freebsd watchdog interface
reflects its age... a timeout of 128 or even 256 seconds is probably
reasonable for some big server where you want to be extra-sure it's a
genuine lockup because a reboot is pretty drastic.  But if your
smartphone locks up, do you really want to wait 5 minutes for it to
recover?  Timeouts at the low end seem more important these days.

It does kind of suck, though, that the hardware can do 1-15 seconds and
we can only hit 4 datapoints in the bottom half of that range.  I'm
pondering ways we could extend the current interface in the kernel to do
better, without breaking existing drivers.

On the other hand, what we've been getting by accident for a couple
years is a 9-second timeout with 1-second petting, so -s 4 -t 8 is
actually a bit better than that. :)

-- Ian




More information about the freebsd-arm mailing list