callout(9) really this complicated?

Thu Jul 10 06:19:57 UTC 2014

John Baldwin wrote this message on Wed, Jul 09, 2014 at 12:11 -0400:
> On Friday, July 04, 2014 11:57:52 am John-Mark Gurney wrote:
> > Hans Petter Selasky wrote this message on Fri, Jul 04, 2014 at 08:15 +0200:
> > > On 07/04/14 06:15, John-Mark Gurney wrote:
> > > >So, I was going to look at using callout(9) for some time delayed
> > > >actions...  But upon reading the docs, a) the docs are inconsistent,
> > > >and b) the docs only talk about requirements in other section...
> > > >
> > > >Is there a better interface?  If so, can we mark callout(9) deprecated?
> > > >If not, I have some questions...
> > > >
> > > >If you want callout_drain to work properly, you have to add extra code
> > > >to both your callout, and around the usage of it...
> > > >
> > > >callout_drain does not drain the callout:
> > > >      However, the callout subsystem does guarantee that the callout will 
> > > >      be
> > > >      fully stopped before callout_drain() returns.
> > > >
> > > >Yet other parse of the docs say that you can depend upon the callout
> > > >being fully stopped..  I've sent email to Ian (iedowse) about why he
> > > >added this statement...
> > > >
> > > >Second, the amount of work you have to do to make sure you drain
> > > >seems pretty crazy as documented in Avoiding Race Conditions...
> > > >
> > > >It seems like if I have created a callout w/ callout_init_mtx,
> > > >that I shouldn't have to do extra work to make it work correctly...
> > > >
> > > >When calling _callout_stop_safe as callout_drain, there is no assert
> > > >that the lock is held, though it is documented as requiring it by:
> > > >      The function callout_drain() is identical to callout_stop() except 
> > > >      that
> > > >      it will wait for the callout to be completed if it is already in
> > > >      progress.
> > > >
> > > >Maybe we need to add an additional statement here?  and assert that it
> > > >isn't locked?
> > > >
> > > >Also, I have tried to follow the code, but it's complicated, so I'm
> > > >hoping that I can get some help here.
> > 
> > 
> > > Probably the documentation needs an update. The implementation is fine.
> > 
> > Probably...  But w/ bad docs, it makes you wonder about the
> > implementation...
> 
> If you use callout_init_mtx(), then you can use callout_reset() and 
> callout_stop() while holding the mutex and everything will work as expected.  
> You typically only need to use callout_drain() during a device detach
> routine to "destroy" the callout().

So, you're say that we should just remove this text:
     However, the callout subsystem does guarantee that the callout will be
     fully stopped before callout_drain() returns.

> (A typical "detach" routine looks something like this:
> 
>  - detach external connections (ifnet, cdev, etc.)
>  - stop the hardware itself (foo_stop in various NIC drivers)
>    - this step typically does a callout_stop() with the relevant lock held
>  - drain any pending async events
>    - bus_teardown_intr() will block until the interrupt handler is
>      idle
>    - callout_drain() will block if the callout is pending because softclock
>      had already dequeued it and it was waiting for the driver lock when
>      you called callout_stop() earlier
>    - taskqueue_drain() (note that tasks do not have implicit locking ala
>      callout_init_mtx(), so you need to set some sort of flag that your
>      task function checks and breaks out of any loops for in the
>      "stop the hardware" step)
>   - free resources and destroy locks, etc.
> 
> > > drain is always called unlocked.
> > 
> > Then why the whole long section about avoiding races?   Point #3 is
> > the main one that I'm talking about...  It seems like it'd be easier
> > for callout to properly maintain the active flag (maybe set a flag to
> > tell callout to do so), or not even maintain it since it really isn't
> > used for anything important if you can munge it all you want...  There
> > aren't any statements about bad things happening if you call _deactivate
> > before the callout is called...
> 
> The language is unclear, but you only need to use one of the 3 options to
> work around the races.  Note that if you use callout_init_mtx() you fall
> into case #1 and don't need to worry about the others.  If you want to

It isn't clear that point #1 only applies to callout_init_mtx, and that
the other points don't apply...

And I would say that the text:
     The callout subsystem provides a number of mechanisms to address these
     synchronization concerns:

Is wrong, it only provides a mechanism for ONE case, the _init_mtx case,
in all other cases you have to provide the mechanism to avoid the race..

> use callout_init(.., CALLOUT_MPSAFE) and manage all the locking in your
> callout handler directly, then you need to handle the assorted races.
> However, you can generally get by with something far simpler than it suggests.
> You can just do what I described above for taskqueue_drain(), i.e. have
> your timer function do:
> 
> foo(void *arg)
> {
>    struct foo_softc *sc;
> 
>    sc = arg;
>    FOO_LOCK(sc);
>    if (sc->is_dead) {
>        FOO_UNLOCK(sc);
>        return;
>    }
>    ....
>    callout_reset(...);
>    FOO_UNLOCK(sc);
> }
> 
> In this case, simply setting 'is_dead' in the "stop the hardware" step and
> using callout_drain() will work fine.

Ok, unless someone commits a patch soon, I'll create one..

Thanks for the clarification...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."