Timers and timing, was: MySQL Performance 6.0rc1

Poul-Henning Kamp phk at phk.freebsd.dk
Thu Oct 27 22:51:49 PDT 2005


In message <436167D5.2060104 at mac.com>, Chuck Swiger writes:

>> No, this is just the wrong way to attack the problem.
>
>I believe Darwin keeps the timecounters of the system exposed on a common page 
>mapped via the System framework (their libc+libm), which gets mapped in once by 
>init, and then shared with all of it's children copy-on-write.  They are using 
>the PowerPC timebase registers according to a thread on the darwin-kernel list.

Right,

we unfortunately do not have the benefit of being able to beat up the
hardware designers until the do something non-lame in timekeeping.

Unless the hardware timer (registers) are readable from userland that
trick does not work.

We have five (and counting) hardware registers to deal with on the
i386 platform:  i8254, tsc, acpi-{fast|slow}, elan, geode.

Of these only elan and tsc are guaranteed to work in userland.

TSC has other issues (well hashed out by now).

So the scope for doing the userland trick is very limited.

The HPET timer which I'm investigating right now can be used from
userland, unfortunately, it seems to take 1.5 uSec to read it on
some platforms.

Linux does the userland trick with the TSC, but that is playing both
fast and loose with timekeeping, so unless we add headroom to our
precision requirement, that's out of the question.

>> What is needed here is for somebody to define how non-perfect we
>> are willing to allow our timekeeping to be, and _THEN_ we can start
>> to look at how fact we can make it work.
>
>OK.  How about this for one "test of timer quality":
>
>If you call gettimeofday() in a tight loop and count how many times it sees 
>tv_usecs incremented in a second on an idle machine, how well does the system do?

That's not a very good test of quality as it flatlines at 1 million
times.

>> Here are some questions to start out:
>> 
>> For reference the current codes behaviour is noted in [...]
>> 
>>     *	Does time have to be monotonic between CPUs ?
>> 
>> 		Consider:
>> 
>> 		gettimeofday(&t1)	// on CPU1
>> 		work(x)			// a couple context switches
>> 		gettimeofday(&t2)	// on CPU2
>> 
>> 		Should it be guaranteed that t2 >= t1 ?
>> 
>> 		[Yes]
>
>Yes.

This rules out using the TSC on SMP platforms unless extensive testing
have shown it to work reliably and predictably


>For one case, I have some code which needs to update statistics like "packets 
>sent per second" (or "per minute" or "per hour") on a periodic basis.  I use a 
>reasonable timeout-- ~50ms-- for a call to select() (or pcap_dispatch(), etc) 
>so I check time() perhaps 20 times a second, and then update my per-second 
>stats when I notice that time(&now) returns a different value.
>
>Is there a better way of running code once a second, as close to the time the 
>clock ticks?

You could determine your select timeout to aim right instead of  polling:

		gettimeofday (&t1);

		t1.tv_sec = 0;
		t1.tv_usec = 1000000 - t1.tv_usec;

		select (..... &t1);

>> And when you have answered this, remember that your solution needs
>> to be SMP friendly and work on all architectures.
>
>I've at least got a few patches for sys/kern/kern_clock.c mentioned above which 
>help the accuracy of usleep/nanosleep, does that count for something?  :-)

usleep/nanosleep is an entirely different kettle of fish, they are in the
"ticks" domain which is more sort of a heart-beat than timekeeping.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.


More information about the freebsd-current mailing list