cvs commit: src/sys/sys time.h src/sys/kern kern_time.c

Sun Nov 27 13:17:48 GMT 2005

On Sun, 27 Nov 2005, Bruce Evans wrote:

>>  Add experimental low-precision clockid_t names corresponding to these
>>  clocks, but implemented using cached timestamps in kernel rather than
>>  a full time counter query.
>
> These existence of these interfaces is a mistake even in the kernel. On 
> all machines that I've looked at, the calls to the high-precision 
> binuptime() outnumber calls to all the other high-level timecounter 
> routines combined by a large factor.  E.g., on pluto1.freebsd.org (which 
> seems typical) now, after an uptime of ~8 days, there have been ~1200 
> million calls to binuptime(), ~124 million calls to getmicrouptime(), 
> ~72 million calls to gtemicrotime(), and relatively few other calls.
>
> Thus we get a small speedup at a cost of some complexity and large 
> inerface bloat.
>
> This is partly because there are too many context switches and context 
> switches necessarily use a precise timestamp, and file timestamps are 
> under-represented since they normally use a direct access to 
> time_second.

Interestingly, I've now observed several application workloads where the 
rate of user space high precision time queries far outnumbers the kernel 
rate of time stamp queries.  Specifically, for applications that are 
event-driven and need to generate time outs to pass to poll() and 
select().  Applications like BIND9 generate two gettimeofday() system 
calls for every select() call, in order to manage their own internal event 
engine.  As select() itself has a precision keyed to 1/HZ, using time 
stamps at a similarly low precision for driving an internal scheduler 
based on select() or poll() makes some amount of sense.  Using the 
libwrapper.so I attached to my previous e-mail and setting 'FAST' mode, I 
see a 4% performance improvement in throughput for BIND9.  David Xu has 
reported a similar improvement in MySQL performance using libwrapper.so. 
For BIND9 under high load, the rate of context switches is much lower than 
the rate of select() calls, as multiple queries are delivered to the UDP 
socket per interrupt due to interrupt coalescing (etc).

Given the way applications are being written to manage their own event 
loops using select() or similar interfaces, the ability to quickly request 
low precision timestamps for use with those interfaces makes a fairly 
significant difference in macro-level performance.  How we expose 
"cheaper, suckier time" is something I'm quite willing to discuss, but the 
evidence seems to suggest that if we want to improve the performance of 
this class of applications, we need to provide time keeping services that 
match their requirements (run frequently with fairly weak requirements on 
precision).  I'm entirely open to exposing this service in different ways, 
or offering a different notion of "cheaper, suckier".  For example, I 
could imagine exposing an interface intended to return timing information 
specifically for HZ-driven sleep mechanisms, such as poll() and select(). 
The advantage, for experimental purposes, in the approach I committed is 
that it allows us to easily test the impact of such changes on 
applications without modifing the application.  The disadvantage is that 
we'll want to change it, but given that I am not yet clear we fully 
understand the requirements, that is probably inevitable.

FWIW, once we have an interface that says "here's how you get bad time", 
we can implement it in other ways than I've done -- for example, exporting 
a kernel memory page with the necessary information to somewhat reliably 
convert rdtsc() into an estimated time stamp without ever doing a system 
call (this is what Darwin does, btw).

Your proposals on how this should be done are most welcome, but the trick 
will be balancing the needs of several parties -- people interested in 
highly precise time measurement due to a preoccupation with NTP and atomic 
clocks, people who just want their applications to run faster, and people 
who want the system to be clean.  I think we can meet most of the needs of 
most of these people if we do it right, but I'm not sure what right is 
since (to be honest) I don't have a detailed understanding of what each of 
these communities really needs (let alone wants).

Robert N M Watson