ichwd watchdog vs intel smm code

Andriy Gapon avg at icyb.net.ua
Mon Jan 19 04:11:57 PST 2009

Some time earlier I reported that ichwd driver doesn't produce any
effect on my system with Intel DG33TL motherboard when watchdog timer is
allowed to expire.
I did some investigation and also discussed the issue with a maintainer
of Linux iTCO_wdt driver, Wim Van Sebroeck. He told me that the issue is
present on most of Intel motherboards with ICH9 chipset.

All evidence seems to point in one direction - Intel SMM code.
It seems that ICH9 always gets configured by BIOS to produce an SMI when
a watchdog timer expires. Also, ICH watchdog has a logic to cause a
reboot if its timer expires twice in a row, this is to avoid a potential
hang in SMI handler.
ICH9 allows SMI to be globally enabled and disabled. BIOS configures it
to be enabled. Sometimes this bit is locked down (by BIOS), so it's not
possible to change it, but sometimes it is not locked.
So, if we globally disable SMI, then the watchdog in ICH9 does indeed
cause a reboot. Evidently this happens after the timer expires twice in row.

My conclusion is that SMI handler installed by BIOS somehow acknowledges
watchdog SMI (e.g. clears a certain status bit, or performs watchdog
timer reload). But it doesn't execute any "corrective" action.

My guess is there might some configuration (or some other form of
massaging) that needs to be done in order to convince SMM code to
perform a reboot upon watchdog SMI.

It would be nice if we had some Intel insiders who could provide a
glimpse into Intel SMM code logic with respect to the watchdog.

Finally some technical details:

NMI2SMI_EN and TCO_LOCK bits of TCO1_CNT are set 1:
pmbase+0x0068: 0x1200     (TCO1_CNT)

in SMI_EN register TCO_EN bit is 1, GBL_SMI_EN is 1, "End of SMI
(EOS)" bit is 1 and (not sure if this matters) LEGACY_USB_EN is 1:
pmbase+0x0030: 0x0000203b (SMI_EN)

"No Reboot (NR)" bit of GCS register is zero:
0x3410: 0x00c01444

Andriy Gapon

