svn commit: r286880 - head/sys/kern

Julien Charbon jch at freebsd.org
Wed Aug 26 18:14:26 UTC 2015


 Hi Hans,

On 26/08/15 10:12, Hans Petter Selasky wrote:
> On 08/26/15 09:25, Hans Petter Selasky wrote:
>> On 08/18/15 12:15, Julien Charbon wrote:
>>> Author: jch
>>> Date: Tue Aug 18 10:15:09 2015
>>> New Revision: 286880
>>> URL: https://svnweb.freebsd.org/changeset/base/286880
>>>
>>> Log:
>>>    callout_stop() should return 0 (fail) when the callout is currently
>>>    being serviced and indeed unstoppable.
>>>
>>>    A scenario to reproduce this case is:
>>>
>>>    - the callout is being serviced and at same time,
>>>    - callout_reset() is called on this callout that sets
>>>      the CALLOUT_PENDING flag and at same time,
>>>    - callout_stop() is called on this callout and returns 1 (success)
>>>      even if the callout is indeed currently running and unstoppable.
>>>
>>>    This issue was caught up while making r284245 (D2763) workaround, and
>>>    was discussed at BSDCan 2015.  Once applied the r284245 workaround
>>>    is not needed anymore and will be reverted.
>>>
>>>    Differential Revision:    https://reviews.freebsd.org/D3078
>>>    Reviewed by:        jhb
>>>    Sponsored by:        Verisign, Inc.
>>>
>>> Modified:
>>>    head/sys/kern/kern_timeout.c
>>>
>>> Modified: head/sys/kern/kern_timeout.c
>>> ==============================================================================
>>>
>>>
>>> --- head/sys/kern/kern_timeout.c    Tue Aug 18 10:07:03 2015   
>>> (r286879)
>>> +++ head/sys/kern/kern_timeout.c    Tue Aug 18 10:15:09 2015   
>>> (r286880)
>>> @@ -1150,7 +1150,7 @@ _callout_stop_safe(struct callout *c, in
>>>       struct callout_cpu *cc, *old_cc;
>>>       struct lock_class *class;
>>>       int direct, sq_locked, use_lock;
>>> -    int not_on_a_list;
>>> +    int not_on_a_list, not_running;
>>>
>>>       if (safe)
>>>           WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, c->c_lock,
>>> @@ -1378,8 +1378,15 @@ again:
>>>           }
>>>       }
>>>       callout_cc_del(c, cc);
>>> +
>>> +    /*
>>> +     * If we are asked to stop a callout which is currently in progress
>>> +     * and indeed impossible to stop then return 0.
>>> +     */
>>> +    not_running = !(cc_exec_curr(cc, direct) == c);
>>> +
>>>       CC_UNLOCK(cc);
>>> -    return (1);
>>> +    return (not_running);
>>>   }
>>>
>>>   void
> 
> I'm sorry to say, but I think this change should be backed out as it
> changes the behaviour of the callout API. The old code was correct. The
> issues should be fixed in the TCP stack instead, like suggested by D1563
> and also r284245.
> 
> This patch introduces new behaviour to the callout_stop() function,
> which is contradicting "man 9 callout" and _not_ documented anywhere!
> 
> Reading "man 9 callout" in 11-current:
> 
>>      The callout_reset() and callout_schedule() function families
>> schedule a
>>      future function invocation for callout c.  If c already has a
>> pending
>>      callout, it is cancelled before the new invocation is scheduled.
> 
> callout_reset() causes a _new_ callout invocation and it is logical that
> the return value of callout_stop() reflect operations on the _new_
> callout invocation and not the previous one.
> 
>>      The function callout_stop() cancels a callout c if it is
>> currently pend-
>>      ing.  If the callout is pending, then callout_stop() returns a
>> non-zero
>>      value.  If the callout is not set, has already been serviced, or
>> is cur-
>>      rently being serviced, then zero will be returned.  If the
>> callout has an
>>      associated lock, then that lock must be held when this function is
>>      called.
> 
> If there are two callout invocations that callout_stop() needs to
> handle, it should be called twice.
> 
> The correct way is:
> 
> callout_reset();
> callout_stop();
> callout_drain();
> 
> For the TCP stack's matter, it should be:
> 
> callout_reset();
> callout_stop(); /* cancel _new_ callout invocation */
> callout_stop(); /* check for any pending callout invokation */

 First thank for your time.  I indeed agree with your analysis and I am
not opposed to back out this change.  The border between bug or feature
can indeed be thin;  below why I was more on "bug" side:

o Pro "It is a feature":

 - This behavior is here since the beginning and nobody ever complains
(Big one)

o Pro "It is a bug":

 - Having callout_stop() returning 1 (success) while having the callout
currently being serviced is counter intuitive.

 - You cannot call callout_stop() twice to address this case:

callout_reset();
callout_stop(); /* cancel _new_ callout invocation */
callout_stop(); /* check for any pending callout invokation */

 Because the second callout_stop() will always return 0 (Fail).  In
details:  If the callout is currently being serviced (without r286880
i.e. the classical behavior):

callout_reset() returns 0 (Fail)        (PENDING flag set)
callout_stop() returns 1  (Success)     (PENDING flag removed)
callout_stop() returns 0  (Always fail because not PENDING flag)

 In mpsafe mode, the only way a callout_stop() can return 1 (Success) is
with the PENDING flags set.  The way I found to address this case is with:

fail_reset = !callout_reset();
fail_stop = !callout_stop();

if (fail_reset || fail_stop) {
  /* Callout not stopped */
}

 This is what I did in r284245:

https://svnweb.freebsd.org/base/head/sys/netinet/tcp_timer.c?r1=284245&r2=284244&pathrev=284245

 And it felt wrong to me because callout_reset() and callout_stop()
calls can be far away.

o Pro "it is 'neither'" (i.e. API unclear by design):

 In this case the same callout is indeed _both_ pending and currently
being serviced.  According to man page callout(9):  (As you already noticed)

 If the callout is pending, then callout_stop() returns a non-zero
        ^^^^^^^^^^^^^^^^^^
value.

 Conclusion: It is a feature!

 And just after:

 If the callout is not set, has already been serviced, or is
currently being serviced, then zero will be returned.
^^^^^^^^^^^^^^^^^^^^^^^^

 Conclusion: It is a bug!

 Obviously, no indication about the case where the callout is indeed
_both_ pending and currently being serviced.

> Remember there are other OS'es out there using our TCP/IP stack which
> do not have an identical callout subsystem.

 I was not aware of that.

 As I said, I am not opposed to back out this change, callout(9) API in
mpsafe mode is a already complex/subtle API, it won't change too much
the current complexity.

 Let say that if nobody screams until Friday 8/28, I will put back
r284245 and revert this change _and_ I will make this case clear in the
man page.

--
Julien

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20150826/2107252d/attachment.bin>


More information about the svn-src-head mailing list