Re: kernel epoch crash in IPv4 multicast code

From: Mike Karels <mike_at_karels.net>
Date: Mon, 21 Mar 2022 15:11:44 UTC
Kristof wrote:
> On 18 Mar 2022, at 19:02, Mike Karels wrote:
> > It looks like the IPv4 multicast code has not been fully converted to
> > use epochs.  I installed this week's snapshot of -current, configured
> > and started mrouted, and started rwhod -m.  The system crashed shortly
> > thereafter with this:
> >
> > panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_output.c:343
> > cpuid = 15
> > time = 1647609865
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01b51a39d0
> > vpanic() at vpanic+0x17f/frame 0xfffffe01b51a3a20
> > panic() at panic+0x43/frame 0xfffffe01b51a3a80
> > ip_output() at ip_output+0x15f9/frame 0xfffffe01b51a3b80
> > phyint_send() at phyint_send+0x107/frame 0xfffffe01b51a3be0
> > ip_mdq() at ip_mdq+0x259/frame 0xfffffe01b51a3c60
> > X_ip_mrouter_set() at X_ip_mrouter_set+0x9e4/frame 0xfffffe01b51a3d30
> > sosetopt() at sosetopt+0xee/frame 0xfffffe01b51a3d80
> > kern_setsockopt() at kern_setsockopt+0xad/frame 0xfffffe01b51a3de0
> > sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe01b51a3e00
> > amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe01b51a3f30
> > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01b51a3f30
> > --- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x821b72dda, rsp = 0x8204c06f8, rbp = 0x8204c0750 ---
> > KDB: enter: panic
> >
> > The kgdb backtrace is appended.
> >
> > It looks like ip_mroute is protected in the forwarding path (it's called
> > from ip_input) and the output path, but not in the setup path from
> > setsockopt().  At least the MRT_ADD_MFC call needs to enter an epoch.
> > I tried adding epoch handling in add_mfc(), and that seems to work.
> > The alternative would be to do it in Xip_mrouter_set() so it would cover
> > all the calls.  Any opinions?
> >
> Your analysis looks reasonable.
> I think I'd suggest adding the NET_EPOCH_ENTER() calls in add_mfc(). We already do that in add_vif(), so we'd be following existing choices.

> I'd also suggest adding NET_EPOCH_ASSERT() to everything which directly or indirectly calls ip_output(). That should help us catch other potential issues like this one.

Thanks.  I had already added one assert; I added one in send_packet() as
well.

For anyone interested, this is now in review:
https://reviews.freebsd.org/D34624.

		Mike
> Br,
> Kristof