Re: Panic when starting wireless on head n271247-a527b9cb721a

From: Bjoern A. Zeeb <bz_at_FreeBSD.org>
Date: Thu, 25 Jul 2024 16:09:44 UTC
On Thu, 18 Jul 2024, Kevin Oberman wrote:

> I attempted to update my development system to today's head. After
> installing the kernel, etcupdate -p, reboot, installworld, etcupdate,
> check-old, delete-old, reboot,, the system panicked when the system tried
> starting the network.
>
> System is a T16-Gen1 with the Alder Lake wifi. When starting the network,
> it panics with:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 5; apic id = 12
> fault virtual address   = 0xc
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff8359afd3
> stack pointer           = 0x28:0xfffffe00f1341c80
> frame pointer           = 0x28:0xfffffe00f1341d00
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                        = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 0 (linuxkpi_short_wq_1)
> rdi: fffffe016695c4f8 rsi: fffffe00f1341c48 rdx: ffffffff8118971b
> rcx: 0000000000000000  r8: 0000000000000001  r9: ffffffffffffffff
> rax: 0000000000000000 rbx: fffffe0167886e80 rbp: fffffe00f1341d00
> r10: 0000000000010000 r11: 0000000000000001 r12: fffffe0167887478
> r13: 0000000000000000 r14: fffffe016695c4c8 r15: fffff80003bec540
>
> I can supply the full core.txt file. The backtrace shows the following
> items:
> iwl_mvm_bt_notif_iterator() at iwl_mvm_bt_notif_iterator+0xf3/frame
> 0xfffffe00f1341d00
> linuxkpi_ieee80211_iterate_interfaces() at
> linuxkpi_ieee80211_iterate_interfaces+0x84/frame 0xfffffe00f1341d40
> iwl_mvm_bt_coex_notif_handle() at iwl_mvm_bt_coex_notif_handle+0x7c/frame
> 0xfffffe00f1341da0
> iwl_mvm_async_handlers_wk() at iwl_mvm_async_handlers_wk+0x110/frame
> 0xfffffe00f1341df0
>
> Should I open a ticket or add to an existing one? I didn't see one with a
> quick look.

No, I am not aware of any of it either;  have you hit this more than once?

This is a NULL deref somewhere in iwl_mvm_bt_notif_per_link() if my lldb thinks the same...


     280         link_conf = rcu_dereference(vif->link_conf[link_id]);
     281         /* This can happen due to races: if we receive the notification
     282          * and have the mutex held, while mac80211 is stuck on our mutex
     283          * in the middle of removing the link.
     284          */
     285         if (!link_conf)
     286                 return;
     287
     288         chanctx_conf = rcu_dereference(link_conf->chanctx_conf);
     289
     290         /* If channel context is invalid or not on 2.4GHz .. */
     291         if ((!chanctx_conf ||
     292              chanctx_conf->def.chan->band != NL80211_BAND_2GHZ)) {
...

Seems chanctx_conf->def.chan was NULL as that's 0xc offset.

That means this likely happened right before the first SCAN->AUTH happened.
It seems we need to initialize the def.chan on vif creation as well.

For tracking purposes, yes, please file a PR;  for simplicity feel free
to simply link to this mail in the archives.

/bz

-- 
Bjoern A. Zeeb                                                     r15:7