Re: ifp gone in ip6_output() -> panic

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Thu, 23 May 2024 22:12:20 UTC
On Wed, 22 May 2024, Zhenlei Huang wrote:

>
>
>> On May 22, 2024, at 12:17 PM, Bjoern A. Zeeb <bzeeb-lists@lists.zabbadoz.net> wrote:
>>
>> Hi,
>>
>> sorry, I cannot dump; this is a diskless and netdump does not do IPv6;
>> needless to say that would be funny in this case anyway; unfortunately
>> I have also already re-compiled the kernel so I can only look things up approx.
>>
>> FreeBSD main from May 13 (f3eeeb959c9b00c89a2e1ff009c78162eb398656).
>>
>> I assume we lost the ifp from a destroy of a cloned interface in ip6_output()
>> between lines 806 and 811?
>>
>>
>> Kernel page fault with the following non-sleepable locks held:
>> exclusive rw rawinp (rawinp) r = 0 (0xfffff80002a6e1a0) locked @ /usr/src/sys/netinet6/raw_ip6.c:393
>> stack backtrace:
>> #0 0xffffffff80bb679c at witness_debugger+0x6c
>> #1 0xffffffff80bb7979 at witness_warn+0x3e9
>> #2 0xffffffff81061d10 at trap_pfault+0x80
>> #3 0xffffffff81033878 at calltrap+0x8
>> #4 0xffffffff80d99228 at rip6_send+0x5a8
>> #5 0xffffffff80bf570e at sosend_generic+0x5ee
>> #6 0xffffffff80bf5c49 at sousrsend+0x79
>> #7 0xffffffff80bfbd5c at kern_sendit+0x1bc
>> #8 0xffffffff80bfc073 at sendit+0x1b3
>> #9 0xffffffff80bfc1ab at sys_sendmsg+0x5b
>> #10 0xffffffff81062638 at amd64_syscall+0x158
>> #11 0xffffffff8103418b at fast_syscall_common+0xf8
>> Created wlan(4) interfaces: wlan
>
> Note the creation of wlan, and a following ICMP6 (ping6) packet.

Yes I think it was running netif restart wlan0 in loops.


[...]
>
> I'm not quite sure, but it seems the `ifp` is not fully constructed. See https://cgit.freebsd.org/src/tree/sys/net/if.c#n950 <https://cgit.freebsd.org/src/tree/sys/net/if.c#n950>
>
> If I read the code correctly, the clone created interface is made visible via `if_link_ifnet(ifp);` , and at that time the
> `ifp->if_afdata[AF_INET6]` is NULL and is not initialized yet by `if_attachdomain1()` which will call `in6_domifattach()`
> to allocate the required data.
>
> So I guess there is a race condition. I bet this can be repeated easily.
>
> I have not tested this yet, and not sure if it is the right fix, but you can give it a try.

I'll do; I haven't seen the error happening since on other test
machines, so not sure about repeatability.

I am also not entirely sure this is not a ping6 ff02::1%wlan0 while
the ifp was destroyed by netif restart at the same time the packet was
still on the way out?

If it was during create, the wlan(4) interface would not be associated
and UP at that point of if_attach_internal() and
`ifconfig inet6 -ifdisabled` would not have been run to be able to send
that packet in first place?

Othwerwise the packet would have had to "survive" the clone destroy and
clone create cycle somewhere ...?


> diff --git a/sys/net/if.c b/sys/net/if.c
> index c3c27fbf678f..16ee5667e7bb 100644
> --- a/sys/net/if.c
> +++ b/sys/net/if.c
> @@ -947,11 +947,11 @@ if_attach_internal(struct ifnet *ifp, bool vmove)
>        }
> #endif
>
> -       if_link_ifnet(ifp);
> -
>        if (domain_init_status >= 2)
>                if_attachdomain1(ifp);
>
> +       if_link_ifnet(ifp);
> +
>        EVENTHANDLER_INVOKE(ifnet_arrival_event, ifp);
>        if (IS_DEFAULT_VNET(curvnet))
>                devctl_notify("IFNET", ifp->if_xname, "ATTACH", NULL);

-- 
Bjoern A. Zeeb                                                     r15:7