Using the monotonic clock in time(1)?

Thu Feb 22 14:31:20 UTC 2018

On Wed, 21 Feb 2018, Alan Somers wrote:

> time(1) currently uses the realtime clock, which is undesirable for timing
> short-lived commands while ntpd is active.  I opened a review to add an
> option to use the monotonic clock instead, but jilles suggested that time
> should use the monotonic clock unconditionally, since that's almost always
> better for measuring short durations.  However, the Open Group's
> specification seems to require the real time clock.  What do the standards
> folks think?  Is the Open Group spec sufficiently ambiguous and/or wrong
> that we should switch to the monotonic clock instead?
>
> https://reviews.freebsd.org/D14032
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/time.html

I think it really does mean the CLOCK_REALTIME for the real time.  time(1)
and syscalls involving times had no alternative to using the real time
before CLOCK_MONOTONIC existed, and changing them to use another clock id
would be incompatible.  POSIX is more careful about this for sleep() and
nanonsleep().  It specifies using CLOCK_REALTIME for old sleep functions
and adds clock_nanosleep() to allow sleeping on CLOCK_MONOTONIC and possibly
other clock ids.  This was broken in FreeBSD until recently.  FreeBSD didn't
have clock_nanosleep(), and its nanosleep() slept on CLOCK_MONOTONIC without
documenting this, but claimed conformance with POSIX.1-1993 which specifies
CLOCK_REALTIME.

I think itimers are still broken.  POSIX is not so careful for them.  In the
2001 version, it specifies that ITIMER_REAL decrements in real time, whatever
that is.  realitimexpire() does sleep on real time in 4.4BSD-Lite, but
FreeBSD changed it to sleep on a monotonic time (the fuzzy one returned by
getmicrouptime()) in FreeBSD-3 or earlier.  realitimexpire() now sleeps on
the non-fuzzy monotonic time returned by microuptime().  POSIX.1-1993 has
better (except more complicated) APIs than the itimer ones.  These move the
bugs to applications by making the timer id a parameter.

The 2007 version of POSIX still says "real time" for ITIMER_REAL.
However, both versions have a whole section 3.307 to give a not so
good specification of "Real Time".  This doesn't mention CLOCK_REALTIME,
but says "Time measured as total units elapsed by the system clock...".
It is clear that there is only 1 system clock, and it would be strange
if the one used to measure real times is not CLOCK_REALTIME.s

Kernel timeouts are still broken.  Always using the monotonic clock for
them would be correct if it were documented and the monotonic clock worked.
However, the monotonic clock doesn't advance while the system is suspended,
and timeouts based on it are extended by the length of the suspension.
POSIX doesn't allow this.  It requires the monotonic clock to measure
time (whatever) that is since some fixed time in the past, but FreeBSD acts
as if this point varies, and the implementation is essentially to vary it.
Steps of CLOCK_REALTIME also vary it.  "sysctl kern.boottime" shows this
point and its changes.  Utilities like w show the monotonic time but claim
to show the uptime.  It really is the uptime not counting suspended time.

After nanosleep() was fixed, "time sleep 10" has a chance of working across
leap seconds adjustments, clock steps, and short suspensions.  It should
sleep for 10+ seconds according time time(), even when the physical time
advanced by a different amount (least illegitimately, by 9 or 11 seconds
for a leap seconds adjustment; clock steps shouldn't be allowed; and for
suspensions the physical time should be 10 seconds).  If you change time(1)
to show the monotonic time, this would break again, especially across
suspensions since the monotonic time is broken across them.

The breakage would be small after fixing the monotonic time and not allowing
steps.  The only difference is for leap seconds.

It is unclear what time is or what different clocks measure.  Clocks can't
measure physical time (whatever that is) exactly and the implementation has
to slew them so that they don't drift too far from the absolute physical
time.  FreeBSD slews the real and monotonic times at the same rate and does
illegal steps to make some things work and break others.  A better
implementation would slew these clocks at different rates, as needed for
them to be as equal as possible after adjusting for their starting times.
The only long-term difference should be for leap seconds.  This is a POSIX
bug.  POSIX requires misrepresentation of leap seconds by its
over-specification of the representation of time_t.  For example, suppose
both times are the same after the adjustment, but are physically wrong,
and someone fixes this by stepping CLOCK_REALTIME.  CLOCK_MONOTONIC should
not be stepped, but should be slewed at a different rate.  OTOH, if the
clocks are the same and a leap second adjusment is applied, then
CLOCK_REALTIME must be stepped, but this adjustment is not physical so it
should never be applied to CLOCK_MONOTONIC.  Another way of looking at this
is that CLOCK_MONOTONIC should be as close as possible to the physical
time since the fixed time in the past.  It may drift away, but then it
should be slowly slewed back.

BTW, I recently found that time(1) [-l] is very deficient for measuring
makeworld benchmarks because it doesn't understand system time for other
threads or interrupt time.  I already knew that, but didn't notice how
large the unaccounted-for time has become due to having more CPUs and
things like task queues to do the work.  The usr/sys split for a single
thread is also very inaccurate for short-lived threads.  stathz is still
128 and many cc threads and subthreads can be run in 1/128 seconds.
time(1) [-l] on a large build adds up tiny values in struct rusage for
many children and the errors don't compensate each other very well.
struct rusage is also missing important information like the number of
syscalls.  init(8) inherits the all the child rusages but there is no API
for seeing init's rusage.  For a large benchmark on an otherwise-idle
system, it works well to use global counters read using sysctl.
kern.cp_time is just as accurate as time(1) for the real time given by
the sum of the counters (divided by stathz) and better for everything
else.  It already gives monotonic time.