Re: iwlwifi stuck occasionally

From: Bjoern A. Zeeb <bz_at_FreeBSD.org>
Date: Mon, 29 Aug 2022 20:44:59 UTC
On Sun, 14 Aug 2022, Bjoern A. Zeeb wrote:

> On Sun, 14 Aug 2022, Takashi Inoue wrote:
>
> Hi,
>
> thanks for the report.
>
>> I'am using iwlwifi in the 13.1R on my ThinkPad X1 Nano with AX201.
>> It works well. I'm happy. I'd like to thank you here.
>
> Thanks should go to the FreeBSD Foundation.
>
>
>> But, it stuck occasionally.
>> I could not find the cause. Perhaps, high load?
>
> https://wiki.freebsd.org/WiFi/Iwlwifi#Known_Issues
>
>> iwlwifi0: Queue 1 is stuck 120 199
>
> It means a timer fired as the TX queue didn't move forward.
> The code printing the above line is followed by:
>   /* TODO: access new SCD registers and dump them */
> Seems Intel hasn't implemetned that yet.
>
> There are one or two more fields in the txq we could dump in the meantime to
> see if they give us a hint (things which shouldn't happen but do anyway).
>
> I might go and have a look at that but it'd only show up in HEAD or stbale/13
> not in 13.1-R anymore.
>
> If others are seeing this (regularly) as well (still), then please let me
> know as it'd mean it'd be worthwhile to spend time on now if we can gather
> more information.

That said, coming back to my own test bed with an AX210 I found it like this
(with some other debugging on due to the other one odd card):

wlan30: link state changed to DOWN 
XXX-BZ lkpi_sta_scan_to_auth:999 lkpi_80211_mo_add_chanctx(hw 0xfffffe00464c02c0 conf 0xfffff8000183fe80)
XXX-BZ lkpi_sta_scan_to_auth:1012 lkpi_80211_mo_assign_vif_chanctx(hw 0xfffffe00464c02c0 vif 0xfffffe0046573cc0 conf 0xfffff8000183fe80)
XXX-BZ vif->bss_conf.bssid 0xfffffe008580826f 00:00:00:12:34:56 = ni 0xfffffe0085ac5000 ni_bssid 0xfffffe0085ac526f 00:00:00:12:34:56
wlan30: link state changed to UP 
iwlwifi0: Queue 1 is stuck 2 12 
iwlwifi0:   need_update 0 frozen 0 ampdu 0 now 18446744071567327796 stuck_timer.expires 18446744071567327796 frozen_expiry_remainder 0 wd_timeout 1000

So the timer expired and the TXq simply didn't move despite outstanding items.
And that means the driver triggers a "NMI" and we get the firmware report and would have to do the restart.

Given wd_timeout is set, that's the only place we use the dma_pool_alloc();  I'll go and futher investigate that we are not seeing a LinuxKPI issue there.

Would anyone be willing to add some more debugging if provided (especially if the issue is seen regularly)?


iwlwifi0: Microcode SW error detected. Restarting 0x0.
iwlwifi0: Start IWL Error Log Dump: 
iwlwifi0: Transport status: 0x0000004A, valid: 6 
iwlwifi0: Loaded firmware version: 73.35c0a2c6.0 ty-a0-gf-a0-73.ucode
iwlwifi0: 0x00000084 | NMI_INTERRUPT_UNKNOWN 
iwlwifi0: 0x00A002F0 | trm_hw_status0
iwlwifi0: 0x00000000 | trm_hw_status1
iwlwifi0: 0x004DB676 | branchlink2 
iwlwifi0: 0x004D1896 | interruptlink1
iwlwifi0: 0x004D1896 | interruptlink2
iwlwifi0: 0x004D48DE | data1
iwlwifi0: 0x01000000 | data2
iwlwifi0: 0x00000000 | data3
iwlwifi0: 0x43C11AF6 | beacon time
iwlwifi0: 0xB1905CD1 | tsf low
iwlwifi0: 0x0000005D | tsf hi
iwlwifi0: 0x00000000 | time gp1
iwlwifi0: 0x56A15679 | time gp2
iwlwifi0: 0x00000001 | uCode revision type
iwlwifi0: 0x00000049 | uCode version major
iwlwifi0: 0x35C0A2C6 | uCode version minor
iwlwifi0: 0x00000420 | hw version
iwlwifi0: 0x00C89002 | board version
iwlwifi0: 0x80FBFD26 | hcmd
iwlwifi0: 0x00020000 | isr0
iwlwifi0: 0x01000000 | isr1
iwlwifi0: 0x48F00002 | isr2
iwlwifi0: 0x00C30088 | isr3
iwlwifi0: 0x00200000 | isr4
iwlwifi0: 0x0101001C | last cmd Id
iwlwifi0: 0x004D48DE | wait_event
iwlwifi0: 0x00000080 | l2p_control
iwlwifi0: 0x00000000 | l2p_duration
iwlwifi0: 0x0000003F | l2p_mhvalid
iwlwifi0: 0x00001800 | l2p_addr_match
iwlwifi0: 0x00000009 | lmpm_pmg_sel
iwlwifi0: 0x00000000 | timestamp
iwlwifi0: 0x0000A0C8 | flow_handler
iwlwifi0: Start IWL Error Log Dump:
iwlwifi0: Transport status: 0x0000004A, valid: 7
iwlwifi0: 0x20000066 | NMI_INTERRUPT_HOST
iwlwifi0: 0x00000000 | umac branchlink1
iwlwifi0: 0x8045F174 | umac branchlink2
iwlwifi0: 0x8047EA26 | umac interruptlink1
iwlwifi0: 0x8046ADBE | umac interruptlink2
iwlwifi0: 0x01000000 | umac data1
iwlwifi0: 0x8046ADBE | umac data2
iwlwifi0: 0x00000000 | umac data3
iwlwifi0: 0x00000049 | umac major
iwlwifi0: 0x35C0A2C6 | umac minor
iwlwifi0: 0x56A15793 | frame pointer
iwlwifi0: 0xC0886228 | stack pointer
iwlwifi0: 0x00C30128 | last host cmd
iwlwifi0: 0x00000400 | isr status reg
iwlwifi0: IML/ROM dump:
iwlwifi0: 0x00000B03 | IML/ROM error/state
iwlwifi0: 0x00007F35 | IML/ROM data1
iwlwifi0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
iwlwifi0: Fseq Registers:
iwlwifi0: 0x60000000 | FSEQ_ERROR_CODE
iwlwifi0: 0x80440005 | FSEQ_TOP_INIT_VERSION
iwlwifi0: 0x00080009 | FSEQ_CNVIO_INIT_VERSION
iwlwifi0: 0x0000A652 | FSEQ_OTP_VERSION
iwlwifi0: 0x00000002 | FSEQ_TOP_CONTENT_VERSION
iwlwifi0: 0x4552414E | FSEQ_ALIVE_TOKEN
iwlwifi0: 0x00400410 | FSEQ_CNVI_ID
iwlwifi0: 0x00400410 | FSEQ_CNVR_ID
iwlwifi0: 0x00400410 | CNVI_AUX_MISC_CHIP
iwlwifi0: 0x00400410 | CNVR_AUX_MISC_CHIP
iwlwifi0: 0x00009061 | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
iwlwifi0: 0x00000061 | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
iwlwifi0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).


-- 
Bjoern A. Zeeb                                                     r15:7