New "timeout" api, to replace callout

Andre Oppermann andre at
Sun Dec 2 04:06:32 PST 2007

Poul-Henning Kamp wrote:
> In message <20071202103833.N74097 at>, Robert Watson writes:
>> On Sun, 2 Dec 2007, Poul-Henning Kamp wrote:
>>> I have no idea what the answer to your question is, I'm focusing on 
>>> providing the ability, how we subsequently decide to use it is up to others.
>> Well, I think there is an important question to be discussed regarding 
>> combinatorics, context switching, and the ability to provide multiple callout 
>> threads.
> I still have no way to answer those questions.
> My aim here is to provide and implement an client API that will let
> us play with all those things.
> There are 444 .c or .h files in my src/sys which contains the word
> "callout".
> Obviously, getting the API right, so that we will not have to walk
> all these files once again is a very important point, and the only
> one I am trying to focus on right now.

For TCP the following features/properties would make the implementation
much easier:

  o TCP maintains a number of concurrent, but hierarchical timers for
    each session.  What predominantly happens is a reschedule of an
    existing timer, that means it wasn't close to firing and is moved
    out again.  This happens for every incoming segment.

     -> The timer facility should make it simple and efficient to move
        the deadline into the future.

  o TCP puts the timer into an allocated structure and upon close of the
    session it has to be deallocated including stopping of all currently
    running timers.  At the moment this is not really possible as
    callout_stop() is not atomic and the callout may already be waiting
    to be run on a lock.  At the moment we just live with this race
    condition, apply some bandages and pray.  Since this only happens
    on close and deallocation the operation may be more expensive than
    a normal timer stop call.  Race conditions on normal timeout stops
    like stopping the delack timer are acceptable and can easily be
    handled with TCP.  If it shows up after it was stopped we see it
    and just return.

     -> The timer facility should provide an atomic stop/remove call
        that prevent any further callbacks upon return.  It should not
        do a 'drain' where the callback may be run anyway.
        Note: We hold the lock the callback would have to obtain.

  o TCP has hot and cold CPU/cache affinity.  For certain timers we
    want to stay on the same CPU as it is very likely to still have
    the tcp control block in cache.  The delayed ACK timer is the
    prime example running on some 100ms deadline.  On the other hand
    timeouts farther away like the keepalive timer do not matter as
    there is almost zero chance that any CPU has it still around.
    Note: When we get NIC->CPU affinity we may want to keep all
    timeouts of a particular session always on the same CPU.

     -> The timer facility should provide strong, weak and "don't care"
        CPU affinity.  The affinity should be selected for a timer as
        whole, not upon each call.

  o TCP's data structure is exported to userspace and contains the
    timeout data structures.  This complicates timeout handling as
    the data structure is not known to userland and we have to do
    some hacks to prevent exposure.

     -> The timer facility should provide an opaque userland compat
        header definition.


More information about the freebsd-arch mailing list