Re: IPv6 panic (NULL * deref?) in nd6_ifnet_link_event

From: Kristof Provost <kp_at_freebsd.org>
Date: Sat, 10 May 2025 20:41:33 UTC

> On 10 May 2025, at 21:50, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
> 
> On Sat, 10 May 2025, Kristof Provost wrote:
> 
>> 
>> 
>>>> On 10 May 2025, at 21:32, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
>>> 
>>> Hi,
>>> 
>>> main of the last days.
>>> 
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 2; apic id = 02
>>> fault virtual address   = 0x10
>>> fault code              = supervisor read data, page not present
>>> instruction pointer     = 0x20:0xffffffff80dbd769
>>> stack pointer           = 0x28:0xfffffe0106296d60
>>> frame pointer           = 0x28:0xfffffe0106296d70
>>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>                       = DPL 0, pres 1, long 1, def32 0, gran 1
>>> processor eflags        = interrupt enabled, resume, IOPL = 0
>>> current process         = 12 (swi6: task queue)
>>> rdi: fffff8002f997800 rsi: 000000000000001c rdx: 0000000000000000
>>> rcx: 0000000000010000  r8: 0000000000000001  r9: ffffffffffffffff
>>> rax: 0000000000000000 rbx: fffff8002f997a18 rbp: fffffe0106296d70
>>> r10: ffffffff81c4a1e8 r11: 0000000000000001 r12: fffff80001210700
>>> r13: fffff80001210728 r14: fffff8002f997800 r15: 0000000000000001
>>> trap number             = 12
>>> panic: page fault
>>> cpuid = 2
>>> time = 1746903751
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0106296a90
>>> vpanic() at vpanic+0x136/frame 0xfffffe0106296bc0
>>> panic() at panic+0x43/frame 0xfffffe0106296c20
>>> trap_pfault() at trap_pfault+0x48d/frame 0xfffffe0106296c90
>>> calltrap() at calltrap+0x8/frame 0xfffffe0106296c90
>>> --- trap 0xc, rip = 0xffffffff80dbd769, rsp = 0xfffffe0106296d60, rbp = 0xfffffe0106296d70 ---
>>> nd6_ifnet_link_event() at nd6_ifnet_link_event+0x39/frame 0xfffffe0106296d70
>>> do_link_state_change() at do_link_state_change+0x1b1/frame 0xfffffe0106296dc0
>>> taskqueue_run_locked() at taskqueue_run_locked+0x1c2/frame 0xfffffe0106296e40
>>> taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe0106296e60
>>> ithread_loop() at ithread_loop+0x266/frame 0xfffffe0106296ef0
>>> fork_exit() at fork_exit+0x82/frame 0xfffffe0106296f30
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106296f30
>>> --- trap 0x25b01e6e, rip = 0x52db004fa566ef34, rsp = 0xcadb9a4f3d667734, rbp = 0xde5a00adbd42c69c ---
>>> KDB: enter: panic
>>> 
>>> 
>>> (gdb) l * nd6_ifnet_link_event+0x39
>>> 0xffffffff80dbd769 is in nd6_ifnet_link_event (sys/netinet6/nd6_rtr.c:327).
>>> 322     static void
>>> 323     defrtr_ipv6_only_ipf_down(struct ifnet *ifp)
>>> 324     {
>>> 325
>>> 326             IF_AFDATA_WLOCK(ifp);
>>> 327             ND_IFINFO(ifp)->flags &= ~ND6_IFF_IPV6_ONLY;
>>> 328             IF_AFDATA_WUNLOCK(ifp);
>>> 329     }
>>> 330     #endif  /* EXPERIMENTAL */
>>> 331
>>> 
>> That may be a known issue. There’s something odd with teardown where we sometimes clean up af_data for INET6 and still try to send v6 traffic. I know of panics where there’s a fib6_lookup() that returns a route with no v6 af_data.
>> I put a hack in the pfsense tree to make the panic less likely, but I don’t know what the root cause is.
> 
> This one likely came after the ifp was gone or at least ND_IFINFO(ifp)
> was NULL.  The first would be a contract violation the second is likely
> a bad order/race against queuing.

Yeah, that’s the problem. 

>  But here both can avoid panics by
> NULL checks (+warning maybe so we can find the root casue)?

I believe there are a lot of places that are potentially affected. I don’t know how realistic it is to add guards to all of them. 
And it’s rare enough that it’ll be hard to be sure we got them all. 

— 
Kristof