New "timeout" api, to replace callout

Andre Oppermann andre at freebsd.org
Sun Dec 2 05:55:40 PST 2007


Poul-Henning Kamp wrote:
> In message <4752AABE.6090006 at freebsd.org>, Andre Oppermann writes:
> 
>>> It is my intent, that the implementation behind the new API will
>>> only ever grab the specified lock when it calls the timeout function.
>> This is the same for the current one and pretty much a given.
>>
>>> When you do a timeout_disable() or timeout_cleanup() you will be
>>> sleeping on a mutex internal to the implementation, if the timeout
>>> is currently executing.
>> This is the problematic part.  We can't sleep in TCP when cleaning up
>> the timer.
> 
> The trouble arises because the current callout implementation will
> try to sleep on the timeouts lock, and once it does that, you cannot
> cancel it any more.

It hurts us big time in the TCP code.

> I'm going to exchange that problem for once that is less severe.
> 
> My plan is to use non-blocking grabs of the timeouts lock to get
> around that race.
> 
> When a timeouts timer expires, the thread that services the timeouts
> will try to get the lock in a non-blocking fashion, and if it fails,
> be put on a queue, to be retried after any other expired timeouts
> have had their chance.

In TCP we've got two types of races:

  o Timer expires on active session but source of timer was just
    handled (because segment just arrived).  To simplify detection of
    timer races some generation count passed together with the timer
    may be of value.  That way I (or the timer code) can easily detect
    if this invocation of the callback has become obsolete.

  o On shutdown we have to get rid of all timers for sure because once
    we release the lock it is immediately destroyed and the memory is
    freed and cleared.  There is no way the timer must even try to look
    at the lock again.  This is our major problem child in the TCP and
    socket lifecycle code.

There is another fine line.  When doing a timer cleanup do I get to
know if there is a timeout pending and waiting in the CPU queue?  In
other words can timeout_cleanup() tell us with certainty that a timeout
is no longer active and/or pending?  This would help us half way.

Other than that is a flag planned saying "try only once" to obtain
the lock?  This may help the first race.  Though the current TCP
code is not structured to work that way it could move in that direction.

> That leaves only the question of "how hard to we try to get the lock
> with non-blocking means".
> 
> The answer to that will depend on how big a problem it is in practice.
> 
> Adding timeout_cleanup() as an explicit end of life indicator for
> the timeout structure and its lock, makes it possible to use blocking
> methods, at high expense, in those rare cases where non-blocking
> means keeps failing.
> 
> But lets hope we will not need that.

-- 
Andre


More information about the freebsd-arch mailing list