Re: ipmi0: Watchdog set returned 0xc0 (releng_13)
Date: Wed, 15 Sep 2021 15:23:03 UTC
On 9/14/2021 9:29 PM, Alexander Motin wrote: > Hi Mike, > > Could you try my 6c2d4404161a commit? I don't know about your case, but > it fixes 0xcc error I see on my systems for timeouts below 120 seconds. Hi Alexander, This is on the Supermicro X11SCH-F. BMC firmware was version 1.73 (latest version on the website) ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler Its no longer printing the error! If I start up watchdogd -t 30 and then do a killall -9 watchdogd, it does a graceful shutdown of the box !?! Thats very cool. Even better than before as a hard reset. But I guess will it do a hard reset if the box is actually live locked ? I did a quick test to confirm, that it does indeed not wait around too long. I added an infinite loop in /usr/local/etc/rc.d/stop-shutdown.sh and it only fired for 6 seconds before the box hard reset its logged in the BMC log too. # ipmitool sel list 1 | 09/15/2021 | 14:42:04 | Watchdog2 #0xca | Timer interrupt () | Asserted 2 | 09/15/2021 | 14:42:22 | Watchdog2 #0xca | Power cycle () | Asserted I also tried on a X11SSL-F ipmi0: IPMI device rev. 1, firmware rev. 1.60, version 2.0, device support mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler # ipmitool sel list | tail -3 6 | 08/20/2021 | 20:45:38 | Fan #0x45 | Lower Non-recoverable going low | Asserted 7 | 09/15/2021 | 11:15:28 | Watchdog2 #0xca | Timer interrupt () | Asserted 8 | 09/15/2021 | 11:15:38 | Watchdog2 #0xca | Power cycle () | Asserted # I have a RELENG_12 box in production I will try as well later, but so far so good. Thanks for fixing! ---Mike ---Mike