Is it time to expose timespecsub and friends to userland?

Fri Mar 16 19:41:49 UTC 2018

On Fri, 16 Mar 2018, Alan Somers wrote:

> There are at least 19 system calls that have timeout arguments specified as
> struct timespec [1].  "struct stat" describes file timestamps as timespecs
> as well [2].  setsockopt(2),  sched_rr_get_interval(2),
> geom_stats_snapshot_timestamp(3) and pmclog_read(3) use timespecs for other
> purposes.  The sysctl node kern.crypto_stats exposes timespecs.  sudo
> records timespecs in its timestamp files.  Who know how many other ports do
> something similar; I'm not going to grep them all.
>
> And yet, FreeBSD provides no way to manipulate this structure.  Every
> program that uses them has to roll its own version of pretty much the same
> functions.  NetBSD does [3], and has for a long time.  In fact, NetBSD's
> timespecsub is old enough to drink [4].  But in FreeBSD, those functions
> languish inside of an "#ifdef KERNEL".  phk added them in r35029 with the
> comment "XXX: These may change!".  But it's been nearly 20 years, so I
> don't think they're going to change any more.
>
> Shall we finally expose them to userland and add a man page?  Let the
> bikeshed begin!  If the flame isn't too bad, I'll do it.

These functions fill a much needed gap.  POSIX provides the correct number
of APIs for manipulating the simple timeval and timespec structs (none).
glibc also doesn't support these functions or any others for manipulating
these structs, at least in old versions.

The easiest way to manipulate timevals in userland was to convert them
microseconds in floating point.  53-bit doubles work up to 4.5G seconds
= 142 years without losing precision.  Unfortunately, this doesn't
work for timespecs, since 0.142 years is not enough.  However integer
types with at least 64 bits have been standard since C99.  uint64_t
or uintmax_t holds at least 184G seconds = 584 years.

sys/time.h is now polluted with other conversion functions:
- bintime*() (under __BSD_VISIBLE)
- field names like sec and frac in the application namespace.  The older
   timeval and timespec structs are missing the bug of not having prefixes
   for field names.
- sbintime*() and SBT*.  Also under __BSD_VISIBLE, but no user APIs use
   these AFAIK, so they should be under _KERNEL.
- sb* and bt* for sbintimes.  bt is a better prefix than bintime, but is
   more likely to conflict with an application name, especially since it
   is missing an underscore.
- us*, ns*, ms*
- timespec2bintime() and similar bad names with timespec and bintime spelled
   out but "to" punned to "2".  timespec* might actually the in the namespace
   reserved for this file, but it is too verbose.
- OTOH, there is tstosbt().  I talked someone into not punning "2" in this.
   But it is hard to read without some underscores, and has larger namespace
   problems from being shorter.  The latter is not a problem if it is kernel-
   only.

Older timespec*() macros are still under _KERNEL, as you noticed.  Inlines
are correctly not used for the older APIs.  This reduces their pollution.

Even older timer*() macros are also under _KERNEL.  These are marked
NetBSD/OpenBSD compatible.  These have even worse names -- they are for
timevals, not for generic or specific 'timer' structs.

There are also problems with arg order and the number of args.
timeradd() takes 3 args (tvp, uvp, vvp).  timespecadd() takes 2 args in
the opposite order (vvp, uvp).  It is easier for me to just to the
addition than remember the order.

Namespace pollution continues after the timespec and timeval macros: mainly:
- struct clockinfo; all of its fields are missing prefixes.  This is not
   even under __BSD_VISIBLE.
Now looking at only !__BSD_VISIBLE && !_KERNEL code:
- earlier there is some nested pollution, struct timezone and its fields
   which at least have tz_ prefixes, and DST*.
- the NetBSD/OpenBSD mistakes are actually under #ifndef _KERNEL, so they
   pollute userland but not the kernel.  The kernel mostly uses non-macro
   non-inline functions for these.
- lots of pollution from the nested include of <time.h>.  The contents of
   sys/time.h and time.h are still soft of backwards.
- explicity pollution from the nested include of <select.h>.  IIRC, POSIX
   was broken to allow this.  The earlier include of sys/types.h already
   included this nested in the __BSD_VISIBLE case.

Example of converting unportable timer*() to floating point:

kdump/kdump.c:

XX 		if (timestamp & TIMESTAMP_ELAPSED) {
XX 			if (prevtime_e.tv_sec == 0)
XX 				prevtime_e = kth->ktr_time;
XX 			timersub(&kth->ktr_time, &prevtime_e, &temp);
XX 			printf("%jd.%06ld ", (intmax_t)temp.tv_sec,
XX 			    temp.tv_usec);

Replace the last 3 lines by:

 			printf("%.6f ",
 			    kth->ktr_time.tv_sec - prevtime_e.tv_sec + 1e-6 *
 			    (kth->ktr_time.tv_usec - prevtime_e.tv_usec);

If this isn't accurate enough, then the calculation in intmax_t arithmetic
is needed and messier:

 			intmax_t nsec;

 			nsec = (kth->ktr_time.tv_sec - prevtime_e.tv_sec) *
 			    1000000 +
 			    (kth->ktr_time.tv_usec - prevtime_e.tv_usec);
 			printf("%jd.%06jd ", nsec / 1000000, usec % 1000000);

(Actually even messier if nsec can be < 0 -- see below).

XX 		}
XX 		if (timestamp & TIMESTAMP_RELATIVE) {
XX 			if (prevtime.tv_sec == 0)
XX 				prevtime = kth->ktr_time;
XX 			if (timercmp(&kth->ktr_time, &prevtime, <)) {
XX 				timersub(&prevtime, &kth->ktr_time, &temp);
XX 				sign = "-";
XX 			} else {
XX 				timersub(&kth->ktr_time, &prevtime, &temp);
XX 				sign = "";
XX 			}
XX 			prevtime = kth->ktr_time;
XX 			printf("%s%jd.%06ld ", sign, (intmax_t)temp.tv_sec,
XX 			    temp.tv_usec);

These complications are to avoid misprinting for example a timeval of
{ tv_sec = -2; tv_nsec = 999999; } as -2.999999.  It is actually
-(1.000001).  timersub() is foot-shooting to give an unwanted
representation.  Simply calculating the difference in floating point
works the same as before.  Even C's broken division works right for
this case (division should round the integer part down, giving the same
problem as for timevals, but its brokenness is that it rounds towards 0).

XX 		}

The cases where using timer*() is not just worst are when you have to
convert a value in usec back to a timeval.  Then an extra conversion
step is necessary.  Something like:

 	struct timeval finish, prev, resid;

 	...
 	timersub(&finish, &prev, &resid);
 	/* Should have error handling for resid here. */
 	select(..., &resid);

which would be converted to:

 	usec = (finish.tv_sec - prev.tv_sec) * (intmax_t)1000000 +
 	    (finish.tv_usec - prev.tv_usec);
 	/* Error checks like usec >= 0 are easier with a single number. */
 	/* Extra conversion: */
 	resid.tv_sec = usec / 1000000;
 	resid.tv_usec = usec % 1000000;
 	select(..., &resid);

Note that if usec < 0, C's broken division by negative numbers usually
gives an invalid timeval with tv_usec < 0, so the error checking is more
needed.  This the reverse of the problem for printing negative timevals.
Negative timeouts should be treated as 0, but invalid negative timeouts
should be errors.

sbintimes are essentially optimized versions of representing everything
in nsec, with minimal conversions to struct types.  Unfortunately all of
the standard time APIs except difftime() use struct types.

Bruce