[Bug 236702] KERN_UPTIME not updated after resume

Thu Mar 21 13:53:40 UTC 2019

On Thu, 21 Mar 2019 bugzilla-noreply at freebsd.org wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236702
>
>            Bug ID: 236702
>           Summary: KERN_UPTIME not updated after resume
> ...
>          Reporter: trasz at FreeBSD.org
>
> It seems the time returned by clock_gettime(CLOCK_UPTIME) doesn't get updated
> on resume - in other words, when you reboot, then suspend to ram, then wait an
> hour and resume, the uptime, as shown by uptime(1), won't account for that
> hour.  This is different from eg OSX.

This is a longstanding bug.  CLOCK_UPTIME shouldn't exist (it is one of 3
unportable aliases for the POSIX CLOCK_MONOTONIC clock id.   It is actually
documented to be broken in clock_gettime(2) ("it ... increments monotonically
in SI seconds __while the machine is running__") and is only an alias since
CLOCK_MONOTONIC is broken enough to give the same misbehaviour so each
reduces to an alias of the other.

CLOCK_MONOTONIC is not documented to be broken in clock_gettime(2),
except the documentation is fuzzy enough to allow anything ("it
Increments in SI seconds").  Of course, that is impossible; it can
only increment as if in some approximation to SI seconds, and in
practice the approximation is very variable (it has a large error while
the machine is not running, and large steps to catch up, and normally
tiny frequency adjustments while the machine is running, and sometimes
not so tiny frequency adjustments, and tiny and not so tiny steps...).

POSIX specifies a little more, but not enough about accuracy ("...
represents __the__ amount of time (in seconds and nanoseconds since
an unspecified point in the past... This time does not change after
system start-up time").  Of course __the__ amount is too precise to
specify (it depends on the frame of reference, and it is unclear what
this is for a timer a significant fraction of 1 light nanosecond away
from memory used to store the result.  In practice, errors of more
like 1 millisecond are common due to long-term drift at variable rates
of 1-10 ppm.  Anyway, it is clear that CLOCK_MONOTONIC is not allowed
to stop while the machine is not running.  It must act as if it was
never stopped, by stepping it forwards on resume before it is read.
Otherwise, it acts as if the reference point moved.

Stepping of CLOCK_REALTIME has a related bug.  This is implemented by
moving the reference point.  The reference point is kern.boottime.
Moving it maintains the bogus invariant that realtime == uptime +
boottime.  Since uptime is not updated on resume but realtime usually
is updated on resume, boottime must be updated bogusly on resume to
maintain this invariant.

Moving boottime is correct in one case: when the initial realtime is
wrong, which it often is because the hardware clock used to set the
initial time is inaccurate.  Then when ntpd or the user steps the real
time to fix this, the boot time should be stepped too.  After that,
the boot time is as correct as possible, so it should not be changed.
Micro-adjustments by ntpd don't move it, but steps by ntpd or the user
do move it.  The steps may be of a few seconds to fix up for drift, or
many hours to fix up on resume.  Errors in the hardware clock include:
- it being only accurate to the nearest second.  This gives errors of
   at best at most 0.5 seconds.   ntpd should consider 0.5 seconds to be
   a large error and step to catch up.
- drift of the clock while the machine was powered off.  1 second/day
   is typical.
- the hardware clock being on local time gives an error of the timezone
   difference.  This is normally fixed up by adjkerntz stepping the real
   time.

Timeouts are also slightly broken across suspend-resume.  They are
relative, so they don't count time spent suspended.  This is one way
to prevent a thundering herd of timeouts occurring on resume.  But I
think the thundering herd should occur.  Just rate limit it a bit.

FreeBSD still has the bogus unsupported option APM_FIXUP_CALLTODO
related to this.  This stopped compiling slightly before it was committed.
It has rotted for 21 years since then.  It is supposed to give the
thundering herd.  It doesn't attempt to give rate limiting.

Fixing up the real time on resume doesn't work very well either.  On x86,
the best that can happen is approximately:
- use the hardware clock as at boot time but without the timezone difference
   error, and usually with a smaller error from drift while not running, to
   step the real time.  The error is typically 1 second.
- ntpd should be restarted in /etc/rc.resume to finish the stepping, but I've
   never seen this done.  It should consider the 1 second large and do a step
   and not slew.
- if ntpd is not used, then at least use ntpdate (or timed, or a writwatch
   :-)) to check and fix up the clock.
- if ntpd is running, then it will not be as aggressive as when it started
   up.  If you are lucky, then the error will be large enough for ntpd to
   step.
- while the real time is being fixed up, it is more unreliable than usual.
   Resume usually takes several seconds and almost everything should wait
   for it to complete, but the system doesn't take much care with this
   AFAIK

Bruce