Re: Panic when starting wireless on head n271247-a527b9cb721a

From: Kevin Oberman <rkoberman_at_gmail.com>
Date: Thu, 01 Aug 2024 02:10:24 UTC
Ticket opened as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280546

On Thu, Jul 25, 2024 at 9:09 AM Bjoern A. Zeeb <bz@freebsd.org> wrote:

> On Thu, 18 Jul 2024, Kevin Oberman wrote:
>
> > I attempted to update my development system to today's head. After
> > installing the kernel, etcupdate -p, reboot, installworld, etcupdate,
> > check-old, delete-old, reboot,, the system panicked when the system tried
> > starting the network.
> >
> > System is a T16-Gen1 with the Alder Lake wifi. When starting the network,
> > it panics with:
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 5; apic id = 12
> > fault virtual address   = 0xc
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff8359afd3
> > stack pointer           = 0x28:0xfffffe00f1341c80
> > frame pointer           = 0x28:0xfffffe00f1341d00
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                        = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 0 (linuxkpi_short_wq_1)
> > rdi: fffffe016695c4f8 rsi: fffffe00f1341c48 rdx: ffffffff8118971b
> > rcx: 0000000000000000  r8: 0000000000000001  r9: ffffffffffffffff
> > rax: 0000000000000000 rbx: fffffe0167886e80 rbp: fffffe00f1341d00
> > r10: 0000000000010000 r11: 0000000000000001 r12: fffffe0167887478
> > r13: 0000000000000000 r14: fffffe016695c4c8 r15: fffff80003bec540
> >
> > I can supply the full core.txt file. The backtrace shows the following
> > items:
> > iwl_mvm_bt_notif_iterator() at iwl_mvm_bt_notif_iterator+0xf3/frame
> > 0xfffffe00f1341d00
> > linuxkpi_ieee80211_iterate_interfaces() at
> > linuxkpi_ieee80211_iterate_interfaces+0x84/frame 0xfffffe00f1341d40
> > iwl_mvm_bt_coex_notif_handle() at iwl_mvm_bt_coex_notif_handle+0x7c/frame
> > 0xfffffe00f1341da0
> > iwl_mvm_async_handlers_wk() at iwl_mvm_async_handlers_wk+0x110/frame
> > 0xfffffe00f1341df0
> >
> > Should I open a ticket or add to an existing one? I didn't see one with a
> > quick look.
>
> No, I am not aware of any of it either;  have you hit this more than once?
>
> This is a NULL deref somewhere in iwl_mvm_bt_notif_per_link() if my lldb
> thinks the same...
>
>
>      280         link_conf = rcu_dereference(vif->link_conf[link_id]);
>      281         /* This can happen due to races: if we receive the
> notification
>      282          * and have the mutex held, while mac80211 is stuck on
> our mutex
>      283          * in the middle of removing the link.
>      284          */
>      285         if (!link_conf)
>      286                 return;
>      287
>      288         chanctx_conf = rcu_dereference(link_conf->chanctx_conf);
>      289
>      290         /* If channel context is invalid or not on 2.4GHz .. */
>      291         if ((!chanctx_conf ||
>      292              chanctx_conf->def.chan->band != NL80211_BAND_2GHZ)) {
> ...
>
> Seems chanctx_conf->def.chan was NULL as that's 0xc offset.
>
> That means this likely happened right before the first SCAN->AUTH happened.
> It seems we need to initialize the def.chan on vif creation as well.
>
> For tracking purposes, yes, please file a PR;  for simplicity feel free
> to simply link to this mail in the archives.
>
> /bz
>
> --
> Bjoern A. Zeeb                                                     r15:7
>


-- 
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkoberman@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683