Re: software watchdog RELENG13 not working ? (solved)

From: mike tancsa <mike_at_sentex.net>
Date: Mon, 23 Aug 2021 21:13:07 UTC
OK, this looks to be a case of RTFM.  I didnt realize the RELENG_13 dog
works rather differently than the simpler hardware based ones I use. If
I start it up with

watchdogd -t 120 --pretimeout 60 --pretimeout-action log,printf,panic

killall -9 watchdogd

It works as expected

Does anyone have any good experience with the software watchdog ? My
embedded device of choice is out of stock for the next year and the next
best choice I am looking at doesnt seem to have a functional hardware
watchdog.  Its mostly for situations where the box gets live locked,
maybe due to a DDoS or other conditions.


    ---Mike

On 8/23/2021 3:51 PM, mike tancsa wrote:
> Was trying out some new hardware that does not seem to have a hardware
> watchdog on it so was going to try the software one. However, after
> starting up watchdogd -t 20 and then doing a killall -9 watchdogd, a
> GENERIC kernel just prints the messages below and does not reboot.
>
>
> Aug 23 15:40:35 alibox4port01 kernel: interrupt                   total
> Aug 23 15:40:35 alibox4port01 kernel: cpu0:timer                        
> 21875
> Aug 23 15:40:35 alibox4port01 kernel: cpu1:timer                        
> 11810
> Aug 23 15:40:35 alibox4port01 kernel: cpu2:timer                        
> 10753
> Aug 23 15:40:35 alibox4port01 kernel: cpu3:timer                        
> 10172
> Aug 23 15:40:35 alibox4port01 kernel: irq128:
> hdac0                          8
> Aug 23 15:40:35 alibox4port01 kernel: irq129:
> ahci0                       4004
> Aug 23 15:40:35 alibox4port01 kernel: irq130:
> igb0:rxq0                   3202
> Aug 23 15:40:35 alibox4port01 kernel: irq131:
> igb0:rxq1                     19
> Aug 23 15:40:35 alibox4port01 kernel: irq132:
> igb0:aq                        2
> Aug 23 15:40:35 alibox4port01 kernel: irq142:
> xhci0                        568
> Aug 23 15:40:35 alibox4port01 kernel: Total                       62413
> Aug 23 15:40:35 alibox4port01 kernel: KDB: stack backtrace:
> Aug 23 15:40:35 alibox4port01 kernel: #0 0xffffffff80c70e05 at
> kdb_backtrace+0x65
> Aug 23 15:40:35 alibox4port01 kernel: #1 0xffffffff80bb4b0d at
> hardclock+0x1bd
> Aug 23 15:40:35 alibox4port01 kernel: #2 0xffffffff80bb5b74 at
> handleevents+0xc4
> Aug 23 15:40:35 alibox4port01 kernel: #3 0xffffffff80bb653c at timercb+0x25c
> Aug 23 15:40:35 alibox4port01 kernel: #4 0xffffffff8116a3db at
> lapic_handle_timer+0x9b
> Aug 23 15:40:35 alibox4port01 kernel: #5 0xffffffff81083191 at
> Xtimerint+0xb1
> Aug 23 15:40:35 alibox4port01 kernel: #6 0xffffffff804e93ff at
> acpi_cpu_idle+0x2ef
> Aug 23 15:40:35 alibox4port01 kernel: #7 0xffffffff8106d90e at
> cpu_idle_acpi+0x3e
> Aug 23 15:40:35 alibox4port01 kernel: #8 0xffffffff8106d9bf at cpu_idle+0x9f
> Aug 23 15:40:35 alibox4port01 kernel: #9 0xffffffff80c589c4 at
> sched_idletd+0x2e4
> Aug 23 15:40:35 alibox4port01 kernel: #10 0xffffffff80be02ea at
> fork_exit+0x8a
> Aug 23 15:40:35 alibox4port01 kernel: #11 0xffffffff810824ce at
> fork_trampoline+0xe
>
> Its just a stock GENERIC kernel and the box auto loaded the following klds
>
>  # kldstat
> Id Refs Address                Size Name
>  1   44 0xffffffff80200000  1f19458 kernel
>  2    1 0xffffffff8211a000   65b120 zfs.ko
>  3    1 0xffffffff82776000     a010 cryptodev.ko
>  4    1 0xffffffff82ae5000     3250 ichsmb.ko
>  5    1 0xffffffff82ae9000     2180 smbus.ko
>  6    1 0xffffffff82aec000     2340 uhid.ko
>  7    1 0xffffffff82aef000     3380 usbhid.ko
>  8    1 0xffffffff82af3000     31f8 hidbus.ko
>  9    1 0xffffffff82af7000     3320 wmt.ko
> 10    1 0xffffffff82afb000     6360 u3g.ko
> 11    2 0xffffffff82b02000     4d90 ucom.ko
> 12    1 0xffffffff82b07000     5300 usie.ko
>
> Kernel is from
>
> 13.0-STABLE FreeBSD 13.0-STABLE #1 stable/13-8ad5619ec: Fri Aug 20
> 13:15:51 EDT 2021
>
> Chipset seems to be an Intel Gemini Lake. ICHWD doesnt attach to
> anything unfortunately.
>
> ---<<BOOT>>---
> Copyright (c) 1992-2021 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 13.0-STABLE #1 stable/13-8ad5619ec: Fri Aug 20 13:15:51 EDT 2021
>    
> mdtancsa@alibox4port01.sentex.ca:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
> FreeBSD clang version 12.0.1 (git@github.com:llvm/llvm-project.git
> llvmorg-12.0.1-0-gfed41342a82f)
> VT(efifb): resolution 800x600
> CPU: Intel(R) Celeron(R) J4125 CPU @ 2.00GHz (1996.89-MHz K8-class CPU)
>   Origin="GenuineIntel"  Id=0x706a8  Family=0x6  Model=0x7a  Stepping=8
>  
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>  
> Features2=0x4ff8ebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,RDRAND>
>   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
>   AMD Features2=0x101<LAHF,Prefetch>
>   Structured Extended
> Features=0x2294e287<FSGSBASE,TSCADJ,SGX,SMEP,ERMS,NFPUSG,MPX,PQE,RDSEED,SMAP,CLFLUSHOPT,PROCTRACE,SHA>
>   Structured Extended Features2=0x40400004<UMIP,RDPID,SGXLC>
>   Structured Extended
> Features3=0xac000400<MD_CLEAR,IBPB,STIBP,ARCH_CAP,SSBD>
>   XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
>   IA32_ARCH_CAPS=0x6b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,MDS_NO>
>   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
>   TSC: P-state invariant, performance statistics
> real memory  = 8589934592 (8192 MB)
> avail memory = 8075411456 (7701 MB)
> Event timer "LAPIC" quality 600
> ACPI APIC Table: <INTEL  GLK-SOC >
> WARNING: L1 data cache covers fewer APIC IDs than a core (0 < 1)
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s)
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> random: unblocking device.
> ioapic0 <Version 2.0> irqs 0-119
> Launching APs: 2 1 3
> Timecounter "TSC" frequency 1996889148 Hz quality 1000
> random: entropy device external interface
> kbd1 at kbdmux0
> mlx5en: Mellanox Ethernet driver 3.6.0 (December 2020)
> efirtc0: <EFI Realtime Clock>
> efirtc0: registered as a time-of-day clock, resolution 1.000000s
> aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
> acpi0: <ALASKA A M I >
> unknown: I/O range not supported
> cpu0: <ACPI CPU> on acpi0
> attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0
> Event timer "i8254" frequency 1193182 Hz quality 100
> atrtc0: <AT realtime clock> port 0x70-0x77 on acpi0
> atrtc0: Warning: Couldn't map I/O.
> atrtc0: registered as a time-of-day clock, resolution 1.000000s
> Event timer "RTC" frequency 32768 Hz quality 0
> hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 8 on
> acpi0
> Timecounter "HPET" frequency 19200000 Hz quality 950
> Event timer "HPET" frequency 19200000 Hz quality 550
> Event timer "HPET1" frequency 19200000 Hz quality 440
> Event timer "HPET2" frequency 19200000 Hz quality 440
> Event timer "HPET3" frequency 19200000 Hz quality 440
> Event timer "HPET4" frequency 19200000 Hz quality 440
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
> acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pcib0: Length mismatch for 3 range: 1 vs 1000000010000
> pci0: <ACPI PCI bus> on pcib0
> vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem
> 0xa0000000-0xa0ffffff,0x90000000-0x9fffffff irq 19 at device 2.0 on pci0
> vgapci0: Boot video device
> hdac0: <Intel Gemini Lake HDA Controller> mem
> 0xa1510000-0xa1513fff,0xa1000000-0xa10fffff irq 25 at device 14.0 on pci0
> pci0: <simple comms> at device 15.0 (no driver attached)
> ahci0: <Intel Gemini Lake AHCI SATA controller> port
> 0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f mem
> 0xa1514000-0xa1515fff,0xa1518000-0xa15180ff,0xa1517000-0xa15177ff irq 19
> at device 18.0 on pci0
> ahci0: AHCI v1.31 with 2 6Gbps ports, Port Multiplier supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> pcib1: <ACPI PCI-PCI bridge> irq 21 at device 19.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> igb0: <Intel(R) I211 (Copper)> port 0xe000-0xe01f mem
> 0xa1400000-0xa141ffff,0xa1420000-0xa1423fff irq 22 at device 0.0 on pci1
> igb0: Using 1024 TX descriptors and 1024 RX descriptors
> igb0: Using 2 RX queues 2 TX queues
> igb0: Using MSI-X interrupts with 3 vectors
> igb0: Ethernet address: 00:f1:f3:1e:74:c6
> igb0: netmap queues/slots: TX 2/1024, RX 2/1024
> pcib2: <ACPI PCI-PCI bridge> irq 21 at device 19.1 on pci0
> pci2: <ACPI PCI bus> on pcib2
> igb1: <Intel(R) I211 (Copper)> port 0xd000-0xd01f mem
> 0xa1300000-0xa131ffff,0xa1320000-0xa1323fff irq 23 at device 0.0 on pci2
> igb1: Using 1024 TX descriptors and 1024 RX descriptors
> igb1: Using 2 RX queues 2 TX queues
> igb1: Using MSI-X interrupts with 3 vectors
> igb1: Ethernet address: 00:f1:f3:1e:74:c7
> igb1: netmap queues/slots: TX 2/1024, RX 2/1024
> pcib3: <ACPI PCI-PCI bridge> irq 21 at device 19.2 on pci0
> pci3: <ACPI PCI bus> on pcib3
> igb2: <Intel(R) I211 (Copper)> port 0xc000-0xc01f mem
> 0xa1200000-0xa121ffff,0xa1220000-0xa1223fff irq 20 at device 0.0 on pci3
> igb2: Using 1024 TX descriptors and 1024 RX descriptors
> igb2: Using 2 RX queues 2 TX queues
> igb2: Using MSI-X interrupts with 3 vectors
> igb2: Ethernet address: 00:f1:f3:1e:74:c8
> igb2: netmap queues/slots: TX 2/1024, RX 2/1024
> pcib4: <ACPI PCI-PCI bridge> irq 21 at device 19.3 on pci0
> pci4: <ACPI PCI bus> on pcib4
> igb3: <Intel(R) I211 (Copper)> port 0xb000-0xb01f mem
> 0xa1100000-0xa111ffff,0xa1120000-0xa1123fff irq 21 at device 0.0 on pci4
> igb3: Using 1024 TX descriptors and 1024 RX descriptors
> igb3: Using 2 RX queues 2 TX queues
> igb3: Using MSI-X interrupts with 3 vectors
> igb3: Ethernet address: 00:f1:f3:1e:74:c9
> igb3: netmap queues/slots: TX 2/1024, RX 2/1024
> pcib5: <ACPI PCI-PCI bridge> irq 22 at device 20.0 on pci0
> pci5: <ACPI PCI bus> on pcib5
> pcib6: <ACPI PCI-PCI bridge> irq 22 at device 20.1 on pci0
> pci6: <ACPI PCI bus> on pcib6
> xhci0: <Intel Gemini Lake USB 3.0 controller> mem 0xa1500000-0xa150ffff
> irq 17 at device 21.0 on pci0
> xhci0: 32 bytes context size, 64-bit DMA
> usbus0 on xhci0
> usbus0: 5.0Gbps Super Speed USB v3.0
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> acpi_button0: <Power Button> on acpi0
> acpi_tz0: <Thermal Zone> on acpi0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> Timecounters tick every 1.000 msec
> ZFS filesystem version: 5
> ZFS storage pool version: features support (5000)
> hdacc0: <Intel Gemini Lake HDA CODEC> at cad 2 on hdac0
> hdaa0: <Intel Gemini Lake Audio Function Group> at nid 1 on hdacc0
> pcm0: <Intel Gemini Lake (HDMI/DP 8ch)> at nid 3 on hdaa0
> Trying to mount root from zfs:ali4root/ROOT/default []...
> Root mount waiting for: CAM usbus0
> ugen0.1: <0x8086 XHCI root HUB> at usbus0
> uhub0 on usbus0
> uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <ShiJi 128GB U0104A0> ACS-2 ATA SATA 3.x device
> ada0: Serial Number KC20210728170
> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
> ada0: Command Queueing enabled
> ada0: 122104MB (250069680 512 byte sectors)
> uhub0: 16 ports with 16 removable, self powered
> Root mount waiting for: usbus0
> ugen0.2: <USB USB Keykoard> at usbus0
> ukbd0 on uhub0
> ukbd0: <USB USB Keykoard, class 0/0, rev 1.10/1.10, addr 1> on usbus0
> kbd2 at ukbd0
> ugen0.3: <Sierra Wireless, Incorporated MC7700> at usbus0
> ichsmb0: <Intel Gemini Lake SMBus controller> port 0xf040-0xf05f mem
> 0xa1516000-0xa15160ff irq 20 at device 31.1 on pci0
> smbus0: <System Management Bus> on ichsmb0
> lo0: link state changed to UP
> igb0: link state changed to UP
>
> isab0@pci0:0:31:0:      class=0x060100 rev=0x06 hdr=0x00 vendor=0x8086
> device=0x31e8 subvendor=0x8086 subdevice=0x7270
>     vendor     = 'Intel Corporation'
>     device     = 'Celeron/Pentium Silver Processor LPC Controller'
>     class      = bridge
>     subclass   = PCI-ISA
> ichsmb0@pci0:0:31:1:    class=0x0c0500 rev=0x06 hdr=0x00 vendor=0x8086
> device=0x31d4 subvendor=0x8086 subdevice=0x7270
>     vendor     = 'Intel Corporation'
>     device     = 'Celeron/Pentium Silver Processor Gaussian Mixture Model'
>     class      = serial bus
>     subclass   = SMBus
>
> none0@pci0:0:15:0:      class=0x078000 rev=0x06 hdr=0x00 vendor=0x8086
> device=0x319a subvendor=0x8086 subdevice=0x7270
>     vendor     = 'Intel Corporation'
>     device     = 'Celeron/Pentium Silver Processor Trusted Execution
> Engine Interface'
>     class      = simple comms
>
>
>
>