Re: ipmi0: Watchdog set returned 0xc0 (releng_13)

From: mike tancsa <mike_at_sentex.net>
Date: Mon, 16 Aug 2021 14:53:00 UTC
Hi Alexander,

    Thanks for the reply and info. Yes, you are right. I had the timer
set to -t 30, but it actually is printing every 10 seconds.  I had a
look in the BIOS, and other than the one Watchdog setting in the BIOS

Enable or disable to turn
on 5-minute watch dog
timer. Upon timeout, JWD1
jumper determines system
behavior.                   

I dont see any other places to tweak the hardware watchdog. If I enable
that, the box does indeed reboot after 5min, even though I have
watchdogd running. I am not 100% sure, but on other Supermicro boards
this used to work I think


I dont have any other RELENG13 boxes on Supermicro boards to test just yet.

One other thing I noticed was that if I boot up without ipmi loaded,
/dev/fido is there. Does it still see a hardware watchdog somehow, or is
that pointing to something else ?

If I load the kld

0{r}# kldload ipmi
ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
ipmi0: KCS mode found at io 0xca2 on acpi
ipmi0: IPMI device rev. 1, firmware rev. 1.23, version 2.0, device
support mask 0xbf
ipmi0: Number of channels 2
ipmi0: Attached watchdog
ipmi0: Establishing power cycle handler

<wait 15seconds, console still clear>
0{r}# 
0{r}# watchdogd
ipmi0: Watchdog set returned 0xc0
ipmi0: Watchdog set returned 0xc0
ipmi0: Watchdog set returned 0xc0
ipmi0: Watchdog set returned 0xc0
0{r}# ipmi0: Watchdog set returned 0xc0
ipmi0: Watchdog set returned 0xc0

I am going to look around for a BIOS update to see if there is some fix

    ---Mike

On 8/16/2021 10:09 AM, Alexander Motin wrote:
> Hi Mike,
>
> According to IPMI specification 0xc0 means: "Node Busy. Command could
> not be processed because command processing resources are temporarily
> unavailable."  I have no idea what it means for the driver, but I
> suspect that you always have it inside, just before the mentioned commit
> it was quietly ignored.  I can't propose much other that hide it again
> if errors like that get too widespread.  I haven't seen errors like that
> on X11DPI-NT boards I've tested this.  I saw 0xc9 if I set watchdog
> timeout below about a minute, for which I have no explanation either,
> but you may try to experiment with the different timeouts or pat
> intervals.  The errors period of 30s seems interesting, considering
> default pat period in watchdogd of 10s.
>
> On 16.08.2021 09:26, mike tancsa wrote:
>> Hi All,
>>
>>     I updated a box from about a month ago, and noticed that the console
>> is full of
>>
>> ipmi0: Watchdog set returned 0xc0
>>
>> It fires every 30 seconds which is what I have the timer set to.  It
>> seems to be related to the ipmi watchdog as another box I have which
>> uses ichwd doesnt spew a similar message.
>>
>> The only commit seems to be
>>
>> commit b41b86b65f10ccaa8cce8cc11a030ad464b654c0
>> Author: Alexander Motin <mav@FreeBSD.org>
>> Date:   Thu Jul 29 23:39:04 2021 -0400
>>
>> Board is a Super Micro X11SCH-F. Bios 1.5 from 11/17/2020
>>
>> My kernel is not that different from GENERIC. If I do a killall -9
>> watchdogd it reboots as expected.
>>
>>
>>> device  cxgbe
>>> device  cryptodev
>>> options         TCP_SIGNATURE
>>> options         IPSEC
>>> options         IPFIREWALL              #firewall
>>> options         IPFIREWALL_VERBOSE      #enable logging to syslogd(8)
>>> options         IPFIREWALL_VERBOSE_LIMIT=9100    #limit verbosity
>>> options         IPFIREWALL_DEFAULT_TO_ACCEPT    #allow everything by
>> default
>>> #options         ROUTETABLES=2
>>> option FIB_ALGO
>> sysctl.conf is
>>
>> vfs.zfs.min_auto_ashift=12
>> net.inet.ip.redirect=0
>> net.inet6.ip6.redirect=0
>> kern.ipc.maxsockbuf=16777216
>> net.inet.tcp.blackhole=1
>>
>> and loader.conf
>>
>> zfs_load="YES"
>> comconsole_speed="115200"       # Set the current serial console speed
>> boot_multicons="YES"
>> boot_serial="YES"
>> console="efi"
>> ipmi_load="YES"
>> cpu_microcode_load="YES"
>> cpu_microcode_name="/boot/firmware/intel-ucode.bin"
>> comconsole_port="0x2f8"
>>
>> if_disc_load="YES"
>>
>> hw.cxgbe.toecaps_allowed="0"
>> hw.cxgbe.rdmacaps_allowed="0"
>> hw.cxgbe.iscsicaps_allowed="0"
>> hw.cxgbe.fcoecaps_allowed="0"
>> hw.cxgbe.pause_settings="0"
>> hw.cxgbe.attack_filter="1"
>> hw.cxgbe.drop_pkts_with_l3_errors="1"
>>
>> vm.pmap.pti=0
>>
>> net.inet.ip.fw.default_to_accept=1
>>
>>
>> contigmem_load="YES"
>> nic_uio_load="YES"
>> #hw.nic_uio.bdfs="2:0:0,2:0:1"
>> hw.nic_uio.bdfs="2:0:0,2:0:1,2:0:2,2:0:3"
>>
>> hw.contigmem.num_buffers=2
>> hw.contigmem.buffer_size=1073741824
>>
>> dpdk_lpm4_load="YES"
>> dpdk_lpm6_load="YES"