cvs commit: src/sys/sys time.h src/sys/kern kern_time.c

Bruce Evans bde at
Sun Nov 27 13:51:03 GMT 2005

On Sun, 27 Nov 2005, Robert Watson wrote:

> On Sun, 27 Nov 2005, Bruce Evans wrote:
>>>  Add experimental low-precision clockid_t names corresponding to these
>>>  clocks, but implemented using cached timestamps in kernel rather than
>>>  a full time counter query.
>> These existence of these interfaces is a mistake even in the kernel. On all 
>> machines that I've looked at, the calls to the high-precision binuptime() 
>> outnumber calls to all the other high-level timecounter routines combined 
>> by a large factor.  E.g., on (which seems typical) now, 
>> ...
>> Thus we get a small speedup at a cost of some complexity and large inerface 
>> bloat.

> Interestingly, I've now observed several application workloads where the rate 
> of user space high precision time queries far outnumbers the kernel rate of 
> time stamp queries.  Specifically, for applications that are event-driven and 
> need to generate time outs to pass to poll() and select().  Applications like

Apparently pluto1 doesn't run many of these :-).

> BIND9 generate two gettimeofday() system calls for every select() call, in 
> order to manage their own internal event engine.  As select() itself has a 
> precision keyed to 1/HZ, using time stamps at a similarly low precision for 
> driving an internal scheduler based on select() or poll() makes some amount 
> of sense.  Using the I attached to my previous e-mail and 
> setting 'FAST' mode, I see a 4% performance improvement in throughput for 
> BIND9.  David Xu has reported a similar improvement in MySQL performance 
> using For BIND9 under high load, the rate of context switches 
> is much lower than the rate of select() calls, as multiple queries are 
> delivered to the UDP socket per interrupt due to interrupt coalescing (etc).
> Given the way applications are being written to manage their own event loops 
> using select() or similar interfaces, the ability to quickly request low 
> precision timestamps for use with those interfaces makes a fairly significant 
> difference in macro-level performance.  How we expose "cheaper, suckier time"

I can see a use for making a timestamp after select() returns, not for
timeout purposes since the timeout should normally be for emergencies and
it's relative so it doesn't need the current time, but just to record when
things happen.  Then heavy load will cause a lot of returns and there was
no faster way than calling gettimeofday() to see how long the select() took.
Too bad select()'s return timeout is unusable for historical reasons.

Some other classes of applications that make lots of timestamp calls are
ones doing polling (another mistake for a primary interface IMO) and ones
that use a too-short timeout because they want a short timeout and don't
know that the select() granularity is now very short.

> is something I'm quite willing to discuss, but the evidence seems to suggest 
> that if we want to improve the performance of this class of applications, we 
> need to provide time keeping services that match their requirements (run 
> frequently with fairly weak requirements on precision).  I'm entirely open to 
> exposing this service in different ways, or offering a different notion of 
> "cheaper, suckier".  For example, I could imagine exposing an interface 
> intended to return timing information specifically for HZ-driven sleep 
> mechanisms, such as poll() and select(). The advantage, for experimental 
> purposes, in the approach I committed is that it allows us to easily test the 
> impact of such changes on applications without modifing the application.  The 
> disadvantage is that we'll want to change it, but given that I am not yet 
> clear we fully understand the requirements, that is probably inevitable.

The environment variable (or a sysctl/sysconf variable like vfs.timestamp_
precision but per-process or per-user) is probably needed, since you don't
want to teach all applications about unportable CLOCK_*.


More information about the cvs-src mailing list