Issue with epoch_drain_callbacks and unloading iavf(4) [using iflib]

Eric Joyner erj at freebsd.org
Mon Apr 6 21:20:05 UTC 2020


On Tue, Mar 31, 2020 at 12:28 PM Mark Johnston <markj at freebsd.org> wrote:

> On Tue, Mar 31, 2020 at 12:14:20PM -0700, Eric Joyner wrote:
> > Mark,
> >
> > I tried out a kernel with the tip of CURRENT with both D24214 and D24215
> > applied, and I still see the problem. As well, after doing a "sysctl
> > debug.kdb.enter=1" and viewing the stack trace there for kldunload, it
> > appears to be similar to the one I posted in my last post.
>
> Can you show it?  I don't see how it could be the same, since with the
> patch we are no longer calling sched_bind() from the epoch scan call
> back.
>
> >
> > - Eric
> >
> > On Mon, Mar 30, 2020 at 1:19 PM Eric Joyner <erj at freebsd.org> wrote:
> >
> > > On Sat, Mar 28, 2020 at 3:52 PM Mark Johnston <markj at freebsd.org>
> wrote:
> > >
> > >> On Wed, Mar 11, 2020 at 04:32:40PM -0700, Eric Joyner wrote:
> > >> > Mark,
> > >> >
> > >> > I did get some time to get back and retry this; however your second
> > >> patch
> > >> > still doesn't solve the problem. Looking into it a bit, it looks
> like
> > >> the
> > >> > kldunload process isn't hitting the code you've changed; it's
> hanging in
> > >> > epoch_wait_preempt() in if_detach_internal(), which is immediately
> > >> before
> > >> > epoch_drain_callbacks().
> > >> >
> > >> > I did a kernel dump while it was hanging, and this is the backtrace
> for
> > >> the
> > >> > kldunload process:
> > >>
> > >> I see.  I think the callback can be made much simpler and avoid the
> > >> problematic sched_bind() calls.  I wrote a patch that allows waiting
> > >> threads to lend scheduling priority to a preempted thread blocked in
> an
> > >> epoch section, based on some code I wrote to implement preemptible SMR
> > >> sections.  If waiting for a running thread, the callback just spins.
> > >>
> > >> This might be enough to solve your problem, I posted the two lightly
> > >> tested patches here:
> > >> https://reviews.freebsd.org/D24214
> > >> https://reviews.freebsd.org/D24215
> > >>
> > >> If we hit a situation where a reader is preempted and then its CPU is
> > >> hogged by a high-priority kernel thread, this still won't be enough,
> but
> > >> I suspect it'll solve your case.  Would you be able to test?
> > >>
> > >
> > > Yeah, I'll try them out.
> > >
> > >  - Eric
> > >
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>

Mark,

I think I was mistaken about the backtrace looking the same. I was looking
at it from within ddb, and I think I focused on the
epoch_block_handler_preempt line and didn't notice that it only stopped
there this time. Here's the new one I've got from kgdb:

#0  cpustop_handler () at /usr/src/sys/x86/x86/mp_x86.c:1448
#1  0xffffffff80ff2f79 in ipi_nmi_handler () at
/usr/src/sys/x86/x86/mp_x86.c:1405
#2  0xffffffff810294a6 in trap (frame=0xfffffe003b9b6f30) at
/usr/src/sys/amd64/amd64/trap.c:201
#3  <signal handler called>
#4  epoch_block_handler_preempt (global=0xfffff80003de0100,
cr=0xfffffe00dee85900, arg=0x0) at /usr/src/sys/kern/subr_epoch.c:507
#5  0xffffffff803b576d in epoch_block (global=0xfffff80003de0100,
cr=0xfffffe00dee85900, cb=0xffffffff80bcf190 <epoch_block_handler_preempt>,
ct=0x0) at /usr/src/sys/contrib/ck/src/ck_epoch.c:416
#6  ck_epoch_synchronize_wait (global=0xfffff80003de0100, cb=<optimized
out>, ct=<optimized out>) at /usr/src/sys/contrib/ck/src/ck_epoch.c:465
#7  0xffffffff80bcf03c in epoch_wait_preempt (epoch=0xfffff80003de0100) at
/usr/src/sys/kern/subr_epoch.c:529
#8  0xffffffff80c9410a in if_detach_internal (ifp=0xfffff80067ed4000,
vmove=0, ifcp=0x0) at /usr/src/sys/net/if.c:1123
#9  0xffffffff80c93ebd in if_detach (ifp=0xfffff80003de0100) at
/usr/src/sys/net/if.c:1063
#10 0xffffffff80cafa56 in iflib_device_deregister (ctx=0xfffff80088c91800)
at /usr/src/sys/net/iflib.c:5104
#11 0xffffffff80bc1e2e in DEVICE_DETACH (dev=0xfffff80004706a00) at
./device_if.h:234
#12 device_detach (dev=0xfffff80004706a00) at
/usr/src/sys/kern/subr_bus.c:3049
#13 0xffffffff80bc13fd in devclass_driver_deleted
(busclass=0xfffff80004352900, dc=0xfffff80004385a00,
driver=0xffffffff823329e0 <i40e_read_nvm_buffer_aq+352>) at
/usr/src/sys/kern/subr_bus.c:1235
#14 0xffffffff80bc12ef in devclass_delete_driver
(busclass=0xfffff80004352900, driver=0xffffffff823329e0
<i40e_read_nvm_buffer_aq+352>) at /usr/src/sys/kern/subr_bus.c:1310
#15 0xffffffff80bc721c in driver_module_handler (mod=0xfffff80015cd8680,
what=1, arg=0xffffffff823329b0 <i40e_read_nvm_buffer_aq+304>) at
/usr/src/sys/kern/subr_bus.c:5229
#16 0xffffffff80b67b82 in module_unload (mod=0xfffff80015cd8680) at
/usr/src/sys/kern/kern_module.c:261
#17 0xffffffff80b5895b in linker_file_unload (file=0xfffff8016da69a00,
flags=0) at /usr/src/sys/kern/kern_linker.c:700
#18 0xffffffff80b59dad in kern_kldunload (td=<optimized out>, fileid=5,
flags=0) at /usr/src/sys/kern/kern_linker.c:1153
#19 0xffffffff8102aa40 in syscallenter (td=<optimized out>) at
/usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:162
#20 amd64_syscall (td=0xfffffe00e839f100, traced=0) at
/usr/src/sys/amd64/amd64/trap.c:1161
#21 <signal handler called>
#22 0x00000008002ddcba in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffe188

- Eric


More information about the freebsd-net mailing list