watchdog: hw+sw?

Alexander Leidinger Alexander at Leidinger.net
Thu Apr 2 23:46:12 PDT 2009


Quoting Doug Ambrisko <ambrisko at ambrisko.com> (from Thu, 2 Apr 2009  
16:16:34 -0700 (PDT)):

> This worked well for us so I think it is a good idea.  Also some HW
> watchdogs can be told to generate an NMI which can also produce a kernel
> dump/ddb prompt.  I've also implemented some rough code to put an
> simplified back-trace into the IPMI event log in-case a disk or disk
> I/O sub-system died.

Somewhat related... I have 2 32bit systems with zfs which lock up  
after a while. The lockup is strictly related to the disks. I can  
still ping the system just fine, and the HW watchdog seems to still  
work as intended (or it does not work at all anymore, as there's not  
automatic reset), but as soon as I want to do something which involves  
disks (access a webpage located on the zfs disks), I'm lost. The only  
way to get some useful work done again is to reset manually. Your  
paragraph above implies that the WD notices that there's a problem  
with disks.

While I know how to teach our watchdogd how to detect this (-e  
option), we do not have support for this in the basesystem yet. Do you  
have a patch for /etc/rc.d/watchdogd which allows to specify commands  
to run via rc.conf or some patch which tells watchdogd to check a file?

Bye,
Alexander.

-- 
Whatever you want to do, you have to do something else first.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137


More information about the freebsd-hackers mailing list