DragonflyBSD kernel clock improvements

Sat Jan 24 10:01:15 PST 2004

In message <4012806B.7090102 at potentialtech.com>, Bill Moran writes:

>I saw this recently:
>http://www.dragonflybsd.org/Docs/nanosleep/
>
>I was wondering if anyone on the FreeBSD team has looked at this.  It doesn't
>appear as if any recent change have been made in the FreeBSD tree regarding
>this.

I just at it.

>It's probably not a big deal, but I just thought I'd point it out.

Well, it is as big a deal as you make it :-)

We have never aspired to be a RTos so far, so nothing has really
required us to support precise sleeps.  If we want to do that, it's
certainly possible to do it, and even possible to do it right as
opposed to the mistakes presented on the URL above, but as I'll
explain in a moment, probably not worth touching in the first place.

The measurements he presents on that page are irrellevant:  His
sleep periods obviously must start at the beginning of a tick,
otherwise his data would show +/- 1 tick jitter [1].

In other words, the performance he shows only applies to a process
which just got scheduled because of timer and then promptly goes
to sleep again, not a big market IMO.

If he did some actual I/O or CPU-churn, his process would start the
sleep at a random point between two Hz interrupts and his sleep
duration consequently suffer from half of 1/Hz phase noise (= +/-
1 tick jitter)

This is also why PLL'ing Hz misleads him to think he gets better
results:  he removes the jittering factor (the frequency offset
between his Hz and his timecounter) which otherwise would have
averaged out and given him a tell-tale standard deviation alerting
him to his mistake, provided he had done the measurement 101
calculation of statistics on a repeated measurement in the first
place.

In real life PLL'ing Hz will make no measurable difference because
the 1/hz jitter from the start time of the sleep period will be a
much larger fuzz factor.  The PLL will however make a difference
in some extreme round-off cases which do not happen until 1/Hz <
frequency error (ie typically not until HZ >> 10000).

The fact that he neither notices the absence of a +/- 1 tick jitter
in his data, nor the source of the sawtooth does not indicate a
deep of insight into the mechanisms he is trying to measure.

If you want to do it right, you need a programmable interrupting
timer so that you can program it to interrupt (a calibrated amount
of time before) the next time you need to schedule something.  This
is called "deadline interrupting" or "deadline timers" and as a
method it becomes more efficient than regular hearthbeat (like Hz)
when you want to get very high resolution/precision on your timeouts.

Now, before you jump in and start coding, a lot of other factors
need to be looked at as well.  You will never get the precision/resolution
of sleeps better than the sum off:

	worst case interrupt latency +
	lock resolution +  (= "premtion delay")
	context switch +
	VM activation (if the target is paged out or on
	    pages which must be activated by page-faults first) +
	cache flushes +
	fudgefactor for busmaster hogs slowing the CPU down.

If your hardware suffers from "misfeatures" like CPU-throtteling
you need to figure that in too.

The sum above is a pretty high number of microseconds for a normal
unix kernel and makes the deadline interrupting rather pointless
considering that our heartbeat method does 1 millisecond (HZ=2000,
because of Nyquist) without any real problems and 100 microsecond
(HZ=20000) with only a minor tradeoff in overall performance.

The fact that PC hardware in general lacks usable hardware for the
purpose is of course also a strong factor against.

Compare the above sum for UNIX with that of a dedicated 5$ micro
controller and you will understand why UNIX as an RTos is not a
viable concept:  The micro controller can guarantee that the first
instruction of your code executes within N +/- M clock cycles where
M is typically less than four and a clockcycle can be as little as
25nsec yielding 100nsec precision.  If you spend more than $5
you get even better numbers.

Far more productive for FreeBSD would be the implementation of a
calibrated "nanodelay()" for device drivers to use.  DELAY() has
more problems than anyone care to list anymore.

Poul-Henning

[1] This is a basic fact of nature.

When you look on your bedside alarmclock and it shows 05:59AM, you
have no way of knowing how many seconds before the beep starts at
06:00AM.  It can be any number from 1 to 60.  When you time your
daily jog by subtracting a "before" reading from an "after" reading,
your uncertainty is twice that amount = +/- one minute.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.