IPMI hardware watchdogs Re: dell r420/r320 stable/9

Doug Ambrisko ambrisko at ambrisko.com
Tue Jul 31 19:23:35 UTC 2012


On Fri, Jul 27, 2012 at 10:51:43PM +0300, Andriy Gapon wrote:
| on 27/07/2012 17:33 Andrew Boyer said the following:
| > 
| > On Jul 26, 2012, at 8:50 PM, Sean Bruno wrote:
| > 
| >> For the time being I had to revert the following from my stable/9 tree. 
| >> Otherwise I would get a kernel panic on shutdown from ipmi(4).
| >> 
| >> http://svnweb.freebsd.org/base?view=revision&revision=237839 
| >> http://svnweb.freebsd.org/base?view=revision&revision=221121
| > 
| > On a somewhat related note: We noticed recently that you can't pet or disable
| > the IPMI hardware watchdog once SCHEDULER_STOPPED() is true.  This means it
| > can fire unexpectedly while you're dumping core or rebooting, depending on
| > how long the timeout was on the pet before the panic.  The ipmi driver will
| > need to process the command differently if the scheduler is stopped.  I
| > haven't had time to look at a fix yet.
| 
| Yeah, I noticed that unlike most (all?) other watchdog drivers where watchdog
| re-arming is a very basic operation like doing one I/O the IPMI watchdog does
| some more complex stuff which involves waiting on another thread.  I think that
| this may be a little bit too much for a reliable watchdog driver.  At least, as
| you note, this definitely won't work for the panic case where only one thread is
| left running.  I guess that the driver should check for that case and do a
| direct operation instead of enqueueing a request and waiting for another thread
| to execute it.

I have some local hacks, that allows KCS mode to run in a polled mode.
We do that so we can put kernel back traces into the system event
log.  Julian had code in FreeBSD to "pat" a watchdog during a core dump.
We have local code here to disable console muted when dropping into
the kernel debugger and enable console muting when exited.  It might
be useful to tie this into the watchdog, disable it when in kernel
debugger and resume it when exited.

With my polling hack, I don't think I delt with the case if there
was already a transaction in progress.  SMIC could be done like KCS.
SSIF could be harder since it uses the i2c interface to talk to the
HW which is more complicated.

Thanks,

Doug A.


More information about the freebsd-stable mailing list