svn commit: r265472 - head/bin/dd

Thu May 8 03:39:22 UTC 2014

On Wed, 7 May 2014, Jilles Tjoelker wrote:

> On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:
>> On Tue, May 6, 2014 at 9:47 PM, Bruce Evans <brde at optusnet.com.au> wrote:
>>> On Tue, 6 May 2014, Alan Somers wrote:
>>>> ...
>>>>  The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE as
>>>> the
>>>>  clock_id.  That clock advances steadily, regardless of changes to the
>>>> system
>>>>  clock.
>>>> ...
>>>> +#include <sysexits.h>
>
>>> Use of <sysexits.h> is a style bug.  It is not used in BSD or KNF code
>>> like dd used to be.
>
>> sysexits.h is recommended by the err(3) man page.  Is that
>> recommendation meant to apply selectively, or is it obsolete, or is
>> some sort of edit war being waged by man page authors?

Bug in the err(3) man page.  Sort of an edit war.  Just 2 FreeBSD
committers liked sysexits and used it in their code and added a
recommendation to use it in some man pages.  But it has negative
advantages, and normal BSD programs don't use it.  It has been
edited in and out of style(9).

> The recommendation for <sysexits.h> was incompletely removed, yes.

It is still in err(3), and sysexits(3) still justifies itself by
pointing to partly-removed words in style(9).

err(3) is the last place that should recommend using sysexits.  err()
gives a nice way of encouraging text descriptions for all exits.
With text descriptions, there is almost no need for cryptic numeric
exit codes.  Only sets of programs that communicate a little status
in the exit code should use sysexits (or perhaps their own exit
codes, or certain standard exit codes like 126 or 127 for xargs and
some other utilities).  Some of the uses of the standard exit codes
are even.  I don't know of any utility except possibly sendmail that
documents that it uses sysexits enough for its exit codes to be
useful for more than a binary success/fail decision.  Certainly not
dd after these changes.  If its use of sysexits were documented,
then the documentation would say "dd uses sysexits to report 3 errors
that can't happen; otherwise, it uses the normal 2-state exit codes
(there is a macro for them.  It expands to the concise but
grammatically challenged "exits 0 on success, and >0 if an error
occurs".  Here ">0" standardises the usual sloppiness of not
distinguishing codes between 1 and 127).

sysexits(3) now says:

% DESCRIPTION
%      According to style(9), it is not a good practice to call exit(3) with
%      arbitrary values to indicate a failure condition when ending a program.
%      Instead, the pre-defined exit codes from sysexits should be used, so the
%      caller of the process can get a rough estimation about the failure class
%      without looking up the source code.

but style(9) now says:

%      Exits should be 0 on success, or 1 on failure.
% 
%              exit(0);        /*
%                               * Avoid obvious comments such as
%                               * "Exit 0 on success."
%                               */
%      }

The latter is not what I asked for either.  In previous discussion
of this, I think we agreed to at least mention EXIT_SUCCESS and
EXIT_FAILURE, and possibly deprecate sysexits.

This is a weakened version of the 4.4BSD style rules, which say:

% 	/*
% 	 * Exits should be 0 on success, and 1 on failure.  Don't denote
% 	 * all the possible exit points, using the integers 1 through 300.
% 	 */
% 	exit(0);    /* Avoid obvious comments such as "Exit 0 on success." */

The main point of this is to disallow cryptic undocumented exit statuses.
Recommending sysexits almost reverses this.  It gives cryptic undocumented
error statuses that are not even easy to decrypt for programs.  Programs
can look up sysexits, but without documentation there is no guarantee that
the encoding is according to sysexits.  Actually documenting use of
sysexits would make it even more painful to use.

>> [snip]
>>>> -       st.start = tv.tv_sec + tv.tv_usec * 1e-6;
>>>> +       if (clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv))
>>>> +               err(EX_OSERR, "clock_gettime");
>> [snip]
>>>> +       st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
>>>> }
>
> The floating point addition starts losing precision after 8388608
> seconds (slightly more than 97 days, a plausible uptime for a server).
> It is better to subtract the timespecs to avoid this issue.

No, it is better to use floating point for results that only need to
be approximate.  Especially when the inputs are approximate and the
final approximation doesn't need to be very accurate.

Floating point is good for all timespec and timeval calculations,
except in the kernel where it is unavailable.  timespecs and timevals
are mostly used for timeouts, and the kernel isn't very careful about
exact timeouts.  Short timeouts have inherent large inaccuracy due
to interrupt granularity and latency.  Long timeouts can be relatively
more accurate, but only if the kernel is careful about them.  It is
only careful in some places.

> With microseconds, the precision of a double is sufficient for 272
> years, so that calculation is probably acceptable.

dd actually uses double, but float would be plenty.  systat uses a
mixture of float and double.  double througout is better because
using the smaller type float tends to give negative optimizations.
devstat uses long double.  That's really silly for statistics.
On some arches, it is no different from double (so nothing can
depend on extra precision from it).  On sparc64, it is a negative
optimization by a factor of hundreds.

>> [snip]
>> Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
>> should be using a nonmonotonic clock just to measure durations.  I
>> started an audit of all of FreeBSD to look for other programs that use
>> gettimeofday to measure durations.  I haven't finished, but I've
>> already found a lot, including xz, ping, hastd, fetch, systat, powerd,
>> and others.  I don't have time to fix them, though.  Would you be
>> interested, or do you know anyone else who would?
>
> I have a local patch for time(1).
>
> Whether the monotonic clock is right also depends on how long the
> durations typically are. For very long durations, users might refer to
> wall clocks and CLOCK_REALTIME may be more appropriate.

Yes, monotonic clocks are often best, but there are many bugs in this
area.  The most relevant one is perhaps that CLOCK_MONOTONIC is only
monotonic.  It is unclear if standards require it to have any relation
to actual time.  In practice in FreeBSD, it gives the actual time that
the system is up and is not suspended.  It is thus especially unusable
for setting alarm clocks in the morning since suspension overnight is
more likely than at other times.  Alarm clocks need to use real time
anyway.  nanosleep() is almost unusable for setting alarm clocks due
to this problem, its bugs, and other reasons:
- nanosleep() is specified to sleep on real time, but in FreeBSD it sleeps
   on monotonic time.  clock_nanosleep() is specified to sleep on a
   specified clock id, but is not implemented in FreeBSD.
- I don't see any way to use the broken nanosleep() for setting realtime
   alarms except to take short sleeps and check the real time on waking
   up.  Kernel timer code does things like this internally, but not
   very accurately, and for nanosleep() its sleeps are not short enough
   to work and it checks the wrong clock id on waking up.
- nanosleep() takes a relative time, so even a nanosleep() that sleeps
   on the correct clock id would be hard to use with an overnight timeout.
   You would have to know about daylight savings adjustments and either
   compensate for them up front or wake up an hour or 2 early to check
   for a switch.
- there are some POSIX realtime functions that support sleeping on an
   arbitrary clock id, and also support sleeping until an absolute
   time.  These are supported FreeBSD.  I haven't actually used them.
   They are sloppy in different ways than older FreeBSD timer code (and
   not as up to date with the change to sbintime_t).  They seem to be
   unaware of daylight savings and not use short enough sleeps to work
   across switches.
- nanosleep() is specified to sleep in realtime.  Actually more
   specifically, to use CLOCK_REALTIME for its clock id.  But its interval
   is relative, so it is unclear even what this means.

Taking averages over days has similar problems.  They should probably
use the monotonic system up time, not the system up time less the
system suspension time.  Due to the bug of not counting suspension
time, using the real time clock is probably better.  It may jump by
up to about 1 hour across daylight savings switches, but that won't
take it backwards, but the monotonic clock may fail to advance by
much more than 1 hour.

POSIX doesn't actually teh monotonic clock to fail to advance across
suspsensions or for other reasons.  From an old draft:

% 6679 MON          If the Monotonic Clock option is supported, all implementations shall support a clock_id of
% 6680              CLOCK_MONOTONIC defined in <time.h>. This clock represents the monotonic clock for the
% 6681              system. For this clock, the value returned by clock_gettime( ) represents the amount of time (in
% 6682              seconds and nanoseconds) since an unspecified point in the past (for example, system start-up
% 6683              time, or the Epoch). This point does not change after system start-up time. The value of the

Here "amount of time" is fuzzy, but clearly it should be in physical time
and as accurate as possible.

FreeBSD's implementation also breaks the "unspecified point in the past"
by frobbing it to implement the real time.  It is only unspecified in
POSIX.  In FreeBSD, you can see it using sysctl kern.boottime and
indirectly using uptime(1).  uptime (that is, w), has been changed to
use CLOCK_UPTIME, and that gives some of the long-term timing bugs
mentioned above.  Suppose for example that the system booted at 1:00 am
on a certain day.  The boot time is whatever it is, and shouldn't
change.  It serves as the "unspecified point in the past".  It is not
affected by DST switches or by micro-adjustments using adjtime() or ntpd.
However, suppose the clock drifts by 1 second and the real time is fixed
up by stepping the clock.  The real time becomes correct, but the monotonic
time remains off by 1 second.  This is implemented by stepping the boot
time to 1:01 am or 0:59 am.  The boot time becomes wrong too.  CLOCK_UPTIME
is the same as CLOCK_MONOTONIC, so it is also off by 1 second.  This can
be seen in uptime(1) output.  The errors may accumulate.

Of course, the monotonic clock cannot be stepped backwards.  Stepping
it foward wouldn't break it much more than leaving it off by 1 second
forever.  However, the only reasonably correct implementation is to
micro-adjust it until it catches up with any steps in the realtime
clock.  Only do this for small adjustments.  After suspension, it
should be stepped forwards by a large amount.

I think bad things happen to the boot time after suspension too.  The
real time must be stepped forward by a large amount, and doing that
steps the boot time by a large amount.

Similarly for booting if the realtime is initially local.  It is
stepped to make it UTC.  This is confusing.  It happens on my
system, and sysctl kern.boottime shows the boot time
apparently-correctly.  But it is correct as a local time.  The boot
time is in UTC.  sysctl doesn't translate to local time, so the
apparently-correct time is actually off by the step (10 hours).

Bugs in the boot time can be fixed more easily than by micro-adjusting
the monotonic clock.  Just keep the initial boot time (except adjust it
when it was initially local instead of UTC) and frob the real time
using a different variable.  Export both variables so that applications
can compensate for the frobbing at the cost of some complexity.  E.g.,
in uptime(1):

 	clock_gettime(CLOCK_UPTIME, &ts);
 	/*
 	 * Actually, do the compensations in the kernel for CLOCK_UPTIME.
 	 * It doesn't need to be monotonic.  But suppose it is the same
 	 * as the unfixed CLOCK_MONOTONIC and compensate here.
 	 *
 	 * Also fix the bogus variable name 'tp'.
 	 */
 	sysctl_mumble(&boottime);
 	sysctl_mumble(&frobbed_boottime);
 	uptime = ts.tv_sec +- (boottime.tv_sec - frobbed_boottime.tv_sec);

Note that the compensation may go backwards, so this method doesn't work
in general for monotonic times.  However, it can be used if the compensation
is non-negative or relatively small negative.  dd could use this method.
It already has to fix up for zero times and still has parts of the old
method that fixes up for negative times.  Note that the compensation may
be very large across a suspension.  You might start dd, SIGSTOP it, suspend
the system and restart everything a day later.  The compensation would be
about 1 day.  The average from this wouldn't be very useful, but it would
be the same as if dd was stopped for a day but the system was not suspended.

Bruce