Precision Hardware Clocks

From: Josef 'Jeff' Sipek <jeffpc_at_josefsipek.net>
Date: Thu, 09 May 2024 23:31:34 UTC
Hello all,

I've been playing with the idea of extending the kernel to expose various
clock sources to userspace via a character devices.  (Yesterday's thread
about the OCP TAP Time Card nudged me to send this out sooner than I
planned. :) )


The code is *very* hacky and full of TODOs & FIXMEs, but I thought I'd share
it now.

What I'm calling a 'precision hardware clock' (PHC for short) is
conceptually some piece of hardware which can provide the consumer a sense
of time passing.  Roughly speaking, there are two types of precision
hardware clocks - those that return the current time using some defined
timescale (e.g., kvmclock) and those that are simple oscillators with
counters (e.g., many e1000e devices).  My aim is to support both.

My initial goal is to provide a *read-only* access to PHCs as this is
sufficient to make use of them for stabilizing the system clock.  That is,
an application can only query them for the current time.  Eventually, I
think it'd make sense to allow *setting* PHCs as well.

The devices that return the current time are fairly straight forward to work
with.  The ioctl simply calls a device specific method and forwards the
result to the caller.

The counter-type devices are more complicated to support.  In my code I took
the approach that's very similar to the timecounter code in the kernel.  My
first attempt actually tried to extend timecounters but that resulted in a
lot of additional computation being done in hardclock regardless of whether
or not the additional clocks were in use.  That didn't feel right. [1]

My current code borrows the timecounter idea (and some code) of extending
the hardware counter in software.  The overflow check is done via a
per-devices callout that's scheduled based for an interval based on the
oscillator's frequency and the counter's width.  (For debugging, I cap it at
10s max interval.)

Regardless of which type of PHC it is, the ioctl caller gets what amounts to
a <system clock, PHC clock> reading.  Ideally, the two correspond to the
same instant, but there may be some error due to hardware limitations. [2]

Because there is a lot of hardware that doesn't provide a way to capture
these correlated timestamps, a "capture many readings" ioctl is a useful
addition.  This ioctl returns a set of interleaved PHC and system clock
readings, which lets the application (e.g., chrony) do the appropriate
filtering to remove noise.


In addition to adding the PHC code to core kernel, I hacked up the if_em
driver to start the 25MHz timekeeping counters on 82574 devices and register
with the PHC code.  Finally, I hacked up chrony's PHC refclock driver to
make use of the "get timestamp pair" ioctl.

I ran this code on my test box with two 82574 NICs with both registered as
chrony refclocks [3] for a while.  Unsurprisingly, the 82574 oscillators are
not that accurate but they are reasonably stable.  (I posted histograms and
allan deviation plots on mastodon [4].  Since the system's oscillator is in
no way special, it is a bit silly to read too much into the graphs.
However, I'd argue that it still shows that the 82574 refclocks were
reasonably good and would likely help in real world scenarios [5].)

You can find my patches can be found at:

	 https://www.josefsipek.net/freebsd/phc-v1/

There are 3 patches:

 1. chrony.patch modifies chronyd to use the PHC ioctls
 2. fbsd-phc.patch adds the generic PHC code
 3. fbsd-em.patch modifies if_em to register 82574 timekeeping counter with PHC

In addition to cleaning up and generally improving the existing patches, I
hope to implement the bit of code that wires up KVM's KVM_HC_CLOCK_PAIRING
hypercall as a PHC.  While 82574 provides a counter-type PHC, this kvm PHC
would be the absolute time-type PHC.  Support for kvm PHC would allow
FreeBSD guests to sync *very* accurately to host's system clock.

I also have an incomplete patch that adds support for clock_gettime(3) using
PHC fds as clockid_t values, but since it isn't complete I'll keep it to
myself for now :)


So, that's what I've been up to.  As I said in the beginning, I wanted to
get more of this done, but I think it makes sense for me to let others know
about my code now.

I plan to continue hacking away on this, but if people have opinions about
any of this, I'd love to hear them.  It really pains me that there is so
much duplication between the PHC and timecounter code, but the current
tc_windup code runs in a rather special context (hardclock) and having it
process *all* devices regardless of use would increase its runtime quite a
bit.  I've been thinking about trying to move some of the timecounter and
PHC code into a generic set of helpers or try to reorganize kern_tc.c to
fold the PHC login into it sanely, but that's currently very far down the
todo list.


To summarize, the goals/non-goals for this work are:

  Goals:
   * read-only interface to various precision hardware clocks (PHCs)
   * support for both absolute time and counter-only PHCs
   * ability to use software like chrony to stabilize system clocks

  Non-goals/future work:
   * adjusting PHCs
   * support for cross-timestamping techniques (like Intel's ART)
   * support for if_em PTP packet timestamping
   * external pin timestamping support

Thanks for reading this far.  Let me know if you have any questions,
suggestions, etc.

Jeff.

[1] I actually ran for about a week with a e1000e card in my box providing
    timekeeping by selecting it via the kern.timecounter sysctls.  It worked
    and was quite amusing to see, but the additional complexity in tc_windup
    made it unworkable.
[2] At some point, Intel added the Always Running Timer (ART) which can be
    used by devices to get timestamps that are easily convertible to TSC
    readings.  Support for this is part of future work.
[3] The chrony config was the following.  I ran chronyd with the -x flag to
    prevent it from trying to set the clock.  The system clock was
    disciplined with ptp2d, which was syncing to ptp2d running on the same
    server that chrony used for NTP.  Note that the refclocks are marked as
    'pps local', meaning that they are to be used only as a frequency
    source. ('pps' means that the refclock isn't reporting UTC, and 'local'
    means that the clock isn't aligned to UTC seconds)

	server <server> iburst minpoll 0 maxpoll 4 xleave

	refclock PHC /dev/phc-em0 refid EM0 pps local
	refclock PHC /dev/phc-em1 refid EM1 pps local

	logdir /tmp log measurements statistics tracking refclocks selection rtc
	logbanner 0 
[4] https://mastodon.radio/@jeffpc/112230743393202103
[5] A huge problem with NTP is that it suffers greatly from any network
    latency jitter and asymmetrical routing.  Having a stable reference
    clock (even if the stability is short-term only) helps NTP software
    quite a bit.