clock problem

Fri May 11 14:54:07 UTC 2007

In message: <200705111011.l4BABTfh061274 at lurza.secnetix.de>
            Oliver Fromme <olli at lurza.secnetix.de> writes:
: M. Warner Losh wrote:
:  > Peter Jeremy wrote:
:  > : There seems to be a bug in ntpd where the PLL can saturate at
:  > : +/-500ppm and will not recover.  This problem seems too occur mostly
:  > : where the reference servers have lots of jitter (ie a fairly congested
:  > : link to them).
:  > 
:  > Yes.  This is a rather interesting misfeature of ntpd.  Its rails are
:  > at +/- 500ppm, and when it hits the rail it assumes that things are
:  > too bad to continue and it stops.
: 
: I think it is related to the maximum slew rate of 1/2000,
: which is equivalent to 500 ppm.  The ntpd(8) manpage says:
: 
: "Since the slew rate of typical Unix kernels is limited to
: 0.5 ms/s, each second of adjustment requires an amortization
: interval of 2000 s."
: 
: And a bit further down:
: 
: "The maximum slew rate possible is limited to 500 parts-per-
: million (PPM) as a consequence of the correctness principles
: on which the NTP protocol and algorithm design are based.
: As a result, the local clock can take a long time to converge
: to an acceptable offset, about 2,000 s for each second the
: clock is outside the acceptable range."

I think you are confusing two things here.  One is the maximum
frequency error of the system clock that ntpd can tolerate.  The other
is the maximum slew rate of the system clock.

The actual error in nominal frequency of the system clock is what is
recorded.  When ntpd slams the system clock to do its 2ms/s
adjustment, it still records the actual error.  Since the original
drift file was 500.000, this indicates a very bad clock.

:  > Most PC clocks have a frequency error on the order of 10-150ppm, so it
:  > doesn't take a whole lot of jitter from a conjectsted remote network
:  > to exceed the limits...
: 
: I think the "burst" and "iburst" options for the server lines
: in ntp.conf might help in such cases.

It might.

: Of course, the best solution is to buy a GPS or DCF radio
: receiver and set up a startum-1 yourself.  But last time
: I tried to do that with a cheap DCF plug, it wasn't very
: well supported on FreeBSD.  Even an expensive Mainberg
: receiver ( http://www.meinberg.de/english/ ) with an RS232
: output worked much more accurately with a Solaris machine
: than with FreeBSD.  (Unfortunately, the Mainberg model
: availbale to us did not have NTP support via ethernet
: itself, only serial output.)  I have to admit that that
: was in FreeBSD 4.x days.  The situation might have
: improved in the meantime (I don't know).

My company has used FreeBSD's ntpd since 3.x with a small, custom
driver that I wrote.  It turns out to work very well in practice.  I'd
suggest that it is well supported, even in FreeBSD 4.x.  It isn't well
documented.

As for working better on Solaris, I've not done measurements there.  I
do know that our custom clock drivers typically stay less than a
microsecond of the reference clock when the unit has good temperature
stability, and three or five microseconds in a durunal swing when the
units aren't well thermally regulated.

Of course, we are using what is effecitvely a GPS disciplined Rubidium
(Rb) oscillator's PPS as our time base.  We haven't gone the extra
step of using the Rb's 10MHz to synthesize frequencies for the
motherboard, since we don't need system time to be that stable (it
gives two or three more orders of magnitude of stability).  We do use
the kernel FLL, and our custom driver provides the absolute phase.

We get slightly better results in 6.x than we did in 4.x because of a
subtle bug in the kernel that phk fixed in the calculation of error
where two terms that were almost the same had the wrong signs and
almost cancelled out...

Warner