select/poll/usleep precision on FreeBSD vs Linux vs OSX
Luigi Rizzo
rizzo at iet.unipi.it
Wed Feb 29 19:41:20 UTC 2012
I have always been annoyed by the fact that FreeBSD rounds timeouts
in select/usleep/poll in very conservative ways, so i decided to
try how other systems behave in this respect. Attached is a simple
program that you should be able to compile and run on various OS
and see what happens.
Here are the results (HZ=1000 on the system under test, and FreeBSD
has the same behaviour since at least 4.11):
| Actual timeout
| select | poll | usleep|
timeout | FBSD | Linux | OSX | FBSD | FBSD |
usec | 9.0 | Vbox | 10.6 | 9.0 | 9.0 |
--------+-------+-------+--------+-------+-------+
1 2000 99 6 0 2000
10 2000 109 15 0 2000
50 2000 149 66 0 2000
100 2000 196 133 0 2000
500 2000 597 617 0 2000
1000 2000 1103 1136 2000 2000
1001 3000 1103 1136 2000 3000 <---
1500 3000 1608 1631 2000 3000 <---
2000 3000 2096 2127 3000 3000
2001 4000 3000 4000 <---
3001 5000 4000 5000 <---
Note how the rounding (poll has the timeout in milliseconds) affects
the actual timeouts when you are past multiples of 1/HZ.
I know that until we have some hi-res interrupt source there is no
hope to have better than 1/HZ granularity. However we are doing
much worse by adding up to 2 extra ticks. This makes apps less
responsive than they could be, and gives us no way to
"yield until the next tick".
So what I would like to do is add a sysctl (disabled by
default) that enables a better approximation of the desired delay.
I see in the kernel that all three syscalls loop around a blocking
function (tsleep or seltdwait), and do check the "actual" elapsed
time by calling getmicrouptime() or getnanouptime() around the
sleeping function . So the actual timeout passed to tsleep does
not really matter (as long as it is greater than 0 ).
The only concern is that getmicrouptime()/getnanouptime() are documented
as "less precise, but faster to obtain". The question is how precise is
"less precise": do we have some way to get an upper bound for the
precision of the timers used in get*time(), so we can use that value
in the equation instead of the extra 1/HZ that tvtohz() puts in
after computing floor(timeout*HZ) ?
For reference, below is the core of usleep and select/poll
(from kern_time.c and sys_generic.c)
usleep:
getnanouptime(now)
end = now + timeout;
for (;;) {
getnanouptime(now);
delta = end - now;
if (delta <= 0)
break;
tsleep(..., tvtohz(delta) )
}
select/poll:
itimerfix(timeout) // force at least 1/HZ
getmicrouptime(now)
end = now + timeout;
for (;;) {
delta = end - now;
seltdwait(..., tvtohz(delta) )
getmicrouptime(now);
if (some_fd_is_ready() || now >= end)
break;
}
---
cheers
luigi
More information about the freebsd-arch
mailing list