New "timeout" api, to replace callout
andre at freebsd.org
Sun Dec 2 04:06:32 PST 2007
Poul-Henning Kamp wrote:
> In message <20071202103833.N74097 at fledge.watson.org>, Robert Watson writes:
>> On Sun, 2 Dec 2007, Poul-Henning Kamp wrote:
>>> I have no idea what the answer to your question is, I'm focusing on
>>> providing the ability, how we subsequently decide to use it is up to others.
>> Well, I think there is an important question to be discussed regarding
>> combinatorics, context switching, and the ability to provide multiple callout
> I still have no way to answer those questions.
> My aim here is to provide and implement an client API that will let
> us play with all those things.
> There are 444 .c or .h files in my src/sys which contains the word
> Obviously, getting the API right, so that we will not have to walk
> all these files once again is a very important point, and the only
> one I am trying to focus on right now.
For TCP the following features/properties would make the implementation
o TCP maintains a number of concurrent, but hierarchical timers for
each session. What predominantly happens is a reschedule of an
existing timer, that means it wasn't close to firing and is moved
out again. This happens for every incoming segment.
-> The timer facility should make it simple and efficient to move
the deadline into the future.
o TCP puts the timer into an allocated structure and upon close of the
session it has to be deallocated including stopping of all currently
running timers. At the moment this is not really possible as
callout_stop() is not atomic and the callout may already be waiting
to be run on a lock. At the moment we just live with this race
condition, apply some bandages and pray. Since this only happens
on close and deallocation the operation may be more expensive than
a normal timer stop call. Race conditions on normal timeout stops
like stopping the delack timer are acceptable and can easily be
handled with TCP. If it shows up after it was stopped we see it
and just return.
-> The timer facility should provide an atomic stop/remove call
that prevent any further callbacks upon return. It should not
do a 'drain' where the callback may be run anyway.
Note: We hold the lock the callback would have to obtain.
o TCP has hot and cold CPU/cache affinity. For certain timers we
want to stay on the same CPU as it is very likely to still have
the tcp control block in cache. The delayed ACK timer is the
prime example running on some 100ms deadline. On the other hand
timeouts farther away like the keepalive timer do not matter as
there is almost zero chance that any CPU has it still around.
Note: When we get NIC->CPU affinity we may want to keep all
timeouts of a particular session always on the same CPU.
-> The timer facility should provide strong, weak and "don't care"
CPU affinity. The affinity should be selected for a timer as
whole, not upon each call.
o TCP's data structure is exported to userspace and contains the
timeout data structures. This complicates timeout handling as
the data structure is not known to userland and we have to do
some hacks to prevent exposure.
-> The timer facility should provide an opaque userland compat
More information about the freebsd-arch