svn commit: r265472 - head/bin/dd

Thu May 8 15:37:40 UTC 2014

On Wed, May 7, 2014 at 9:39 PM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Wed, 7 May 2014, Jilles Tjoelker wrote:
>
>> On Wed, May 07, 2014 at 12:10:31PM -0600, Alan Somers wrote:
>>>
>>> On Tue, May 6, 2014 at 9:47 PM, Bruce Evans <brde at optusnet.com.au> wrote:
>>>>
>>>> On Tue, 6 May 2014, Alan Somers wrote:
>>>>>
>>>>> ...
>>>>>
>>>>>  The solution is to use clock_gettime(2) with CLOCK_MONOTONIC_PRECISE
>>>>> as
>>>>> the
>>>>>  clock_id.  That clock advances steadily, regardless of changes to the
>>>>> system
>>>>>  clock.
>>>>> ...
>>>>> +#include <sysexits.h>
>>
>>
>>>> Use of <sysexits.h> is a style bug.  It is not used in BSD or KNF code
>>>> like dd used to be.
>>
>>
>>> sysexits.h is recommended by the err(3) man page.  Is that
>>> recommendation meant to apply selectively, or is it obsolete, or is
>>> some sort of edit war being waged by man page authors?
>
>
> Bug in the err(3) man page.  Sort of an edit war.  Just 2 FreeBSD
> committers liked sysexits and used it in their code and added a
> recommendation to use it in some man pages.  But it has negative
> advantages, and normal BSD programs don't use it.  It has been
> edited in and out of style(9).
>
>
>> The recommendation for <sysexits.h> was incompletely removed, yes.
>
>
> It is still in err(3), and sysexits(3) still justifies itself by
> pointing to partly-removed words in style(9).
>
> err(3) is the last place that should recommend using sysexits.  err()
> gives a nice way of encouraging text descriptions for all exits.
> With text descriptions, there is almost no need for cryptic numeric
> exit codes.  Only sets of programs that communicate a little status
> in the exit code should use sysexits (or perhaps their own exit
> codes, or certain standard exit codes like 126 or 127 for xargs and
> some other utilities).  Some of the uses of the standard exit codes
> are even.  I don't know of any utility except possibly sendmail that
> documents that it uses sysexits enough for its exit codes to be
> useful for more than a binary success/fail decision.  Certainly not
> dd after these changes.  If its use of sysexits were documented,
> then the documentation would say "dd uses sysexits to report 3 errors
> that can't happen; otherwise, it uses the normal 2-state exit codes
> (there is a macro for them.  It expands to the concise but
> grammatically challenged "exits 0 on success, and >0 if an error
> occurs".  Here ">0" standardises the usual sloppiness of not
> distinguishing codes between 1 and 127).
>
> sysexits(3) now says:
>
> % DESCRIPTION
> %      According to style(9), it is not a good practice to call exit(3) with
> %      arbitrary values to indicate a failure condition when ending a
> program.
> %      Instead, the pre-defined exit codes from sysexits should be used, so
> the
> %      caller of the process can get a rough estimation about the failure
> class
> %      without looking up the source code.
>
> but style(9) now says:
>
> %      Exits should be 0 on success, or 1 on failure.
> % %              exit(0);        /*
> %                               * Avoid obvious comments such as
> %                               * "Exit 0 on success."
> %                               */
> %      }
>
> The latter is not what I asked for either.  In previous discussion
> of this, I think we agreed to at least mention EXIT_SUCCESS and
> EXIT_FAILURE, and possibly deprecate sysexits.
>
> This is a weakened version of the 4.4BSD style rules, which say:
>
> %       /*
> %        * Exits should be 0 on success, and 1 on failure.  Don't denote
> %        * all the possible exit points, using the integers 1 through 300.
> %        */
> %       exit(0);    /* Avoid obvious comments such as "Exit 0 on success."
> */
>
> The main point of this is to disallow cryptic undocumented exit statuses.
> Recommending sysexits almost reverses this.  It gives cryptic undocumented
> error statuses that are not even easy to decrypt for programs.  Programs
> can look up sysexits, but without documentation there is no guarantee that
> the encoding is according to sysexits.  Actually documenting use of
> sysexits would make it even more painful to use.
>
>
>>> [snip]
>>>>>
>>>>> -       st.start = tv.tv_sec + tv.tv_usec * 1e-6;
>>>>> +       if (clock_gettime(CLOCK_MONOTONIC_PRECISE, &tv))
>>>>> +               err(EX_OSERR, "clock_gettime");
>>>
>>> [snip]
>>>>>
>>>>> +       st.start = tv.tv_sec + tv.tv_nsec * 1.0e-9;
>>>>> }
>>
>>
>> The floating point addition starts losing precision after 8388608
>> seconds (slightly more than 97 days, a plausible uptime for a server).
>> It is better to subtract the timespecs to avoid this issue.
>
>
> No, it is better to use floating point for results that only need to
> be approximate.  Especially when the inputs are approximate and the
> final approximation doesn't need to be very accurate.
>
> Floating point is good for all timespec and timeval calculations,
> except in the kernel where it is unavailable.  timespecs and timevals
> are mostly used for timeouts, and the kernel isn't very careful about
> exact timeouts.  Short timeouts have inherent large inaccuracy due
> to interrupt granularity and latency.  Long timeouts can be relatively
> more accurate, but only if the kernel is careful about them.  It is
> only careful in some places.

No, Jilles is right.  The problem isn't that dd uses doubles; it's
that dd converts longs to doubles _before_ subtracting the values.
That causes rounding if the tv_sec values are large.  If the
implementation of CLOCK_MONOTONIC ever changed to measure time since
the Epoch, or something similar, then the rounding error would be
extremely significant.  Better to subtract the timespecs, then convert
to double.

>
>
>> With microseconds, the precision of a double is sufficient for 272
>> years, so that calculation is probably acceptable.
>
>
> dd actually uses double, but float would be plenty.  systat uses a
> mixture of float and double.  double througout is better because
> using the smaller type float tends to give negative optimizations.
> devstat uses long double.  That's really silly for statistics.
> On some arches, it is no different from double (so nothing can
> depend on extra precision from it).  On sparc64, it is a negative
> optimization by a factor of hundreds.
>
>
>>> [snip]
>>> Even if nanosecond resolution isn't useful, monotonicity is.  Nobody
>>> should be using a nonmonotonic clock just to measure durations.  I
>>> started an audit of all of FreeBSD to look for other programs that use
>>> gettimeofday to measure durations.  I haven't finished, but I've
>>> already found a lot, including xz, ping, hastd, fetch, systat, powerd,
>>> and others.  I don't have time to fix them, though.  Would you be
>>> interested, or do you know anyone else who would?
>>
>>
>> I have a local patch for time(1).
>>
>> Whether the monotonic clock is right also depends on how long the
>> durations typically are. For very long durations, users might refer to
>> wall clocks and CLOCK_REALTIME may be more appropriate.
>
>
> Yes, monotonic clocks are often best, but there are many bugs in this
> area.  The most relevant one is perhaps that CLOCK_MONOTONIC is only
> monotonic.  It is unclear if standards require it to have any relation
> to actual time.  In practice in FreeBSD, it gives the actual time that
> the system is up and is not suspended.  It is thus especially unusable
> for setting alarm clocks in the morning since suspension overnight is
> more likely than at other times.  Alarm clocks need to use real time
> anyway.  nanosleep() is almost unusable for setting alarm clocks due
> to this problem, its bugs, and other reasons:
> - nanosleep() is specified to sleep on real time, but in FreeBSD it sleeps
>   on monotonic time.  clock_nanosleep() is specified to sleep on a
>   specified clock id, but is not implemented in FreeBSD.
> - I don't see any way to use the broken nanosleep() for setting realtime
>   alarms except to take short sleeps and check the real time on waking
>   up.  Kernel timer code does things like this internally, but not
>   very accurately, and for nanosleep() its sleeps are not short enough
>   to work and it checks the wrong clock id on waking up.
> - nanosleep() takes a relative time, so even a nanosleep() that sleeps
>   on the correct clock id would be hard to use with an overnight timeout.
>   You would have to know about daylight savings adjustments and either
>   compensate for them up front or wake up an hour or 2 early to check
>   for a switch.
> - there are some POSIX realtime functions that support sleeping on an
>   arbitrary clock id, and also support sleeping until an absolute
>   time.  These are supported FreeBSD.  I haven't actually used them.
>   They are sloppy in different ways than older FreeBSD timer code (and
>   not as up to date with the change to sbintime_t).  They seem to be
>   unaware of daylight savings and not use short enough sleeps to work
>   across switches.
> - nanosleep() is specified to sleep in realtime.  Actually more
>   specifically, to use CLOCK_REALTIME for its clock id.  But its interval
>   is relative, so it is unclear even what this means.
>
> Taking averages over days has similar problems.  They should probably
> use the monotonic system up time, not the system up time less the
> system suspension time.  Due to the bug of not counting suspension
> time, using the real time clock is probably better.  It may jump by
> up to about 1 hour across daylight savings switches, but that won't
> take it backwards, but the monotonic clock may fail to advance by
> much more than 1 hour.
>
> POSIX doesn't actually teh monotonic clock to fail to advance across
> suspsensions or for other reasons.  From an old draft:
>
> % 6679 MON          If the Monotonic Clock option is supported, all
> implementations shall support a clock_id of
> % 6680              CLOCK_MONOTONIC defined in <time.h>. This clock
> represents the monotonic clock for the
> % 6681              system. For this clock, the value returned by
> clock_gettime( ) represents the amount of time (in
> % 6682              seconds and nanoseconds) since an unspecified point in
> the past (for example, system start-up
> % 6683              time, or the Epoch). This point does not change after
> system start-up time. The value of the
>
> Here "amount of time" is fuzzy, but clearly it should be in physical time
> and as accurate as possible.
>
> FreeBSD's implementation also breaks the "unspecified point in the past"
> by frobbing it to implement the real time.  It is only unspecified in
> POSIX.  In FreeBSD, you can see it using sysctl kern.boottime and
> indirectly using uptime(1).  uptime (that is, w), has been changed to
> use CLOCK_UPTIME, and that gives some of the long-term timing bugs
> mentioned above.  Suppose for example that the system booted at 1:00 am
> on a certain day.  The boot time is whatever it is, and shouldn't
> change.  It serves as the "unspecified point in the past".  It is not
> affected by DST switches or by micro-adjustments using adjtime() or ntpd.
> However, suppose the clock drifts by 1 second and the real time is fixed
> up by stepping the clock.  The real time becomes correct, but the monotonic
> time remains off by 1 second.  This is implemented by stepping the boot
> time to 1:01 am or 0:59 am.  The boot time becomes wrong too.  CLOCK_UPTIME
> is the same as CLOCK_MONOTONIC, so it is also off by 1 second.  This can
> be seen in uptime(1) output.  The errors may accumulate.
>
> Of course, the monotonic clock cannot be stepped backwards.  Stepping
> it foward wouldn't break it much more than leaving it off by 1 second
> forever.  However, the only reasonably correct implementation is to
> micro-adjust it until it catches up with any steps in the realtime
> clock.  Only do this for small adjustments.  After suspension, it
> should be stepped forwards by a large amount.
>
> I think bad things happen to the boot time after suspension too.  The
> real time must be stepped forward by a large amount, and doing that
> steps the boot time by a large amount.
>
> Similarly for booting if the realtime is initially local.  It is
> stepped to make it UTC.  This is confusing.  It happens on my
> system, and sysctl kern.boottime shows the boot time
> apparently-correctly.  But it is correct as a local time.  The boot
> time is in UTC.  sysctl doesn't translate to local time, so the
> apparently-correct time is actually off by the step (10 hours).
>
> Bugs in the boot time can be fixed more easily than by micro-adjusting
> the monotonic clock.  Just keep the initial boot time (except adjust it
> when it was initially local instead of UTC) and frob the real time
> using a different variable.  Export both variables so that applications
> can compensate for the frobbing at the cost of some complexity.  E.g.,
> in uptime(1):
>
>         clock_gettime(CLOCK_UPTIME, &ts);
>         /*
>          * Actually, do the compensations in the kernel for CLOCK_UPTIME.
>          * It doesn't need to be monotonic.  But suppose it is the same
>          * as the unfixed CLOCK_MONOTONIC and compensate here.
>          *
>          * Also fix the bogus variable name 'tp'.
>          */
>         sysctl_mumble(&boottime);
>         sysctl_mumble(&frobbed_boottime);
>         uptime = ts.tv_sec +- (boottime.tv_sec - frobbed_boottime.tv_sec);
>
> Note that the compensation may go backwards, so this method doesn't work
> in general for monotonic times.  However, it can be used if the compensation
> is non-negative or relatively small negative.  dd could use this method.
> It already has to fix up for zero times and still has parts of the old
> method that fixes up for negative times.  Note that the compensation may
> be very large across a suspension.  You might start dd, SIGSTOP it, suspend
> the system and restart everything a day later.  The compensation would be
> about 1 day.  The average from this wouldn't be very useful, but it would
> be the same as if dd was stopped for a day but the system was not suspended.

Wouldn't it be simpler just for the kernel to adjust CLOCK_MONOTONIC
to add suspension time?

-Alan

>
> Bruce