Re: ipmi0: Watchdog set returned 0xc0 (releng_13)

From: mike tancsa <mike_at_sentex.net>
Date: Wed, 15 Sep 2021 15:23:03 UTC
On 9/14/2021 9:29 PM, Alexander Motin wrote:
> Hi Mike,
>
> Could you try my 6c2d4404161a commit?  I don't know about your case, but
> it fixes 0xcc error I see on my systems for timeouts below 120 seconds.

Hi Alexander,

This is on the Supermicro X11SCH-F.  BMC firmware was version 1.73
(latest version on the website)

ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
ipmi0: KCS mode found at io 0xca2 on acpi
ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device
support mask 0xbf
ipmi0: Number of channels 2
ipmi0: Attached watchdog
ipmi0: Establishing power cycle handler

Its no longer printing the error! 

If I start up watchdogd -t 30

and then do a

killall -9 watchdogd,

it does a graceful shutdown of the box !?!  Thats very cool. Even better
than before as a hard reset. But I guess will it do a hard reset if the
box is actually live locked ?  I did a quick test to confirm, that it
does indeed not wait around too  long.  I added an infinite loop in
/usr/local/etc/rc.d/stop-shutdown.sh and it only fired for 6 seconds
before the box hard reset

its logged in the BMC log too.

# ipmitool sel list
   1 | 09/15/2021 | 14:42:04 | Watchdog2 #0xca | Timer interrupt () |
Asserted
   2 | 09/15/2021 | 14:42:22 | Watchdog2 #0xca | Power cycle () | Asserted



I also tried on a X11SSL-F

ipmi0: IPMI device rev. 1, firmware rev. 1.60, version 2.0, device
support mask 0xbf
ipmi0: Number of channels 2
ipmi0: Attached watchdog
ipmi0: Establishing power cycle handler

 # ipmitool sel list | tail -3
   6 | 08/20/2021 | 20:45:38 | Fan #0x45 | Lower Non-recoverable going
low  | Asserted
   7 | 09/15/2021 | 11:15:28 | Watchdog2 #0xca | Timer interrupt () |
Asserted
   8 | 09/15/2021 | 11:15:38 | Watchdog2 #0xca | Power cycle () | Asserted
#

I have a RELENG_12 box in production I will try as well later, but so
far so good.  Thanks for fixing!

    ---Mike



    ---Mike