draining high-frequency callouts

Mon Mar 13 13:39:22 UTC 2017

On Mon, Mar 13, 2017 at 09:27:59AM +0100, Hans Petter Selasky wrote:
> On 03/13/17 09:21, Peter Holm wrote:
> > On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote:
> >> I'm occasionally seeing an assertion failure in softclock_call_cc() when
> >> running DTrace tests on a system with hz=10000. The assertion
> >> (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
> >> concurrently draining the callout, which runs at a high frequency. At
> >> the time of the panic, that thread is spinning on the per-CPU callout
> >> lock after having been awoken from "codrain", and CALLOUT_PENDING is
> >> set on the callout. The callout is direct, i.e., it is executed in hard
> >> interrupt context.
> >>
> >> I think this is what's happening:
> >> - callout_drain() is called while the callout is executing but after the
> >>   callout has rescheduled itself, and goes to sleep after having cleared
> >>   CALLOUT_ACTIVE.
> >> - softclock_call_cc() wakes up the callout_drain() caller, but the
> >>   callout fires again before the caller is scheduled.
> >> - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
> >>   cleared and panics.
> >>
> >> Is there anything that prevents this scenario? Is it really correct to
> >> leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
> >> dropped in order to acquire a sleepqueue lock?
> >>
> >
> > Is this the same problem?
> >
> > panic: softclock_call_cc: act 0xfffff8000de64800 0
> > cpuid = 10
> > time = 1489389893
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0f984c9660
> > vpanic() at vpanic+0x19c/frame 0xfffffe0f984c96e0
> > kassert_panic() at kassert_panic+0x126/frame 0xfffffe0f984c9750
> > softclock_call_cc() at softclock_call_cc+0xae/frame 0xfffffe0f984c98f0
> > softclock() at softclock+0x12c/frame 0xfffffe0f984c9930
> > intr_event_execute_handlers() at intr_event_execute_handlers+0x248/frame 0xfffffe0f984c9980
> > ithread_execute_handlers() at ithread_execute_handlers+0x47/frame 0xfffffe0f984c99b0
> > ithread_loop() at ithread_loop+0xfc/frame 0xfffffe0f984c9a30
> > fork_exit() at fork_exit+0x13b/frame 0xfffffe0f984c9ab0
> >
> > https://people.freebsd.org/~pho/stress/log/kevent7-2.txt
> >
> > I first spotted this @ 12.0-CURRENT #2 r305652M: Fri Sep  9 13:09:03 CEST 2016
> > It's quite easy to reproduce.
> >
> 
> Hi,
> 
> It depends on the trigger. The panic above just means the callout 
> structure was referred after being freed. Either a drain call is missing 
> or the callout escaped drain.
> 
> Can you run this test with a kernel built from projects/hps_head aswell?
> 

No problem there. I ran the test for 3 hours without any problems.

- Peter