incorrect usleep/select delays with HZ > 2500
Matthew Dillon
dillon at apollo.backplane.com
Mon Sep 21 19:12:14 UTC 2009
What we wound up doing was splitting tvtohz() into two functions.
tvtohz_high(tv)
Returned value meets or exceeds requested time. A minimum value
of 1 is returned (really only for {0,0}.. else minimum value is 2).
tvtohz_low(tv)
Returned value might be shorter then requested time, and 0 can
be returned.
Most kernel functions use the tvtohz_high() function. Only a few
use tvtohz_low().
I have not found any 'good' solution to the problem. For example,
average-up errors can mount up when using the results to control a
callout timer resulting in much longer delays then originally intended,
and similarly same-tick interrupts (e.g. a value of 1) can create
much shorter delays then expected. Sometimes one cares more about
the average interval being correct, other times the time must not
be allowed to be too short. You lose no matter what you choose.
http://fxr.watson.org/fxr/source/kern/kern_clock.c?v=DFBSD
If you look at tvtohz_high() you will note that the minimum value
of 1 is only returned if the passed tv is essentially {0,0}. i.e. 0uS.
1uS == 2 ticks (((us + (tick - 1)) / tick) + 1). The 'tick' global
here is the number of uS per tick (not to be confused with 'ticks').
Because of all of that I decided to split the function to make the
requirements more apparent.
--
The nanosleep() work is a different issue... that's for userland calls
(primarily the libc usleep() function). We found that some linux
programs assumed that nanosleep() was far more fine-grained then (hz)
and, anyway, the system call is called 'nanosleep' and 'usleep' which
kind of implies a fine-grained sleep, so we turned it into one when
small time intervals were being requested.
http://fxr.watson.org/fxr/source/kern/kern_time.c?v=DFBSD
The way I figure it if a userland program wants to make system calls
with fine-grained sleeps that are too small, it's really no different
from treating that program as being cpu-bound anyway so why not try to
accomodate it?
--
The 8254 issue is more one of a lack of interest in fixing it.
Basically using the 8254 as a measure of realtime when the reload
value is set to small (i.e. high hz) will always lead to serious
timing problems. The reason there is such a lack of interest
in fixing it is that most machines have other timers available
(lapic, acpi, hpet, tsc, etc). A secondary issue might be tying
real-time functions to 'ticks', which could still be driven by the
8254 interrupt.... those have to be divorced from ticks. I'm not
sure if FreeBSD has any of those left (does date still skip quickly if
hz is set ultra-high? Even when other timers are available?).
I will note that tying real-time functions to the hz-based tick
function (which is also the 8254-driven problem when other timers
are not available) leads to serious problems, particularly with ntpd,
even if you only lose track of the full cycle of the timer
occassionally.
However, neither do you want to 'skip' the ticks value to catch up
to a lost interrupt. That will mess up tsleep() and other hz-based
timeouts that assume that values of '2' will not instantly
timeout.
So actual realtime operations really do have to be completely divorced
from the hz-based ticks counter and it must only be used for looser
timing needs such as protocol timeouts and sleeps.
-Matt
More information about the freebsd-stable
mailing list