svn commit: r310171 - head/sys/sys

Bruce Evans brde at optusnet.com.au
Tue Dec 20 02:44:27 UTC 2016


On Mon, 19 Dec 2016, Ravi Pokala wrote:

> -----Original Message-----
>> From: <owner-src-committers at freebsd.org> on behalf of Ian Lepore <ian at freebsd.org>
>> Date: 2016-12-19, Monday at 11:20
>> To: Warner Losh <imp at bsdimp.com>, Ravi Pokala <rpokala at mac.com>
>> Cc: Sepherosa Ziehau <sepherosa at gmail.com>, Dimitry Andric <dim at freebsd.org>, src-committers <src-committers at freebsd.org>, "svn-src-all at freebsd.org" <svn-src-all at freebsd.org>, "svn-src-head at freebsd.org" <svn-src-head at freebsd.org>
>> Subject: Re: svn commit: r310171 - head/sys/sys
>>
>> On Mon, 2016-12-19 at 11:58 -0700, Warner Losh wrote:
>>>
>>> ...
>>>
>>> Are there other precedence for avoiding the SCN macros in the tree as
>>> well, or is this new art?
>>
>> There was another commit recently the fixed the same kind of scanf
>> error by making the variable fit the scanf type (changing uint64_t to
>> an explicit long long unsigned, iirc).  I don't know if that alone
>> counts as a precedent, but IMO it's a more palatible fix than the
>> SCN/PRI ugliness.

That was apparently a bug, not a fix.  Someone said that the variable
needs to have type precisely uint64_t.
   (Extensive use of fixed-width types in FreeBSD is another bug.  It
   asks for fixed width at all cost.  Using the "fast" or "least" types
   would as for time or space efficiency instead.  Or just use basic
   types whenever possible for simplicity.  Fixed-width types should
   only be used on ABI boundaries.)
I think there is an ABI boundary near the other commit.  Some hardware
wants uint64_t, and we should use uint_fast64_t or uint_least64_t, but
we used uint64_t.  These uses require translation at the ABI boundary,
since the fast and least types may be larger than the fixed-width types.
   (In more complicated cases, the relevant fixed-width type does't exist.
   IIRC, POSIX requires only fixed width types of width 8, 16 and 32 to
   exist, since old badly designed ABIs like inet_addr() require a fixed
   width, but newer ABIs don't repeat the mistake for width 64.  The C
   standard requires the "fast" and "least" types for widths 8, 16, 32 and
   64 to exist.  This is possible since they can all be [unsigned] long
   long or [u]intmax_t although these might be very far from being either
   fast or least.  All fixed-width types are optional in C.  So to support
   bad software ABIs and some hardware ABIs using portable code, you have
   to read write the data using some basic type (perhaps unsigned char)
   and convert it to a "fast" or "least" type (or maybe [u]intmax_t).
We changed to using the unsigned long long abomination.  This is basically
a bad spelling of uintmax_t.  All uint_fast64_t, uint_least64_t, unsigned
long long and uintmax_t are at least as large asthe ABI type uint64_t, so
they can be used to represent the ABI values, but all need translation at
ABI booundaries since they may be strictly larger.  The translation may
be as simple as assignment, depending on how the code is written.  Since
the code wasn't written with this in mind, it might have type puns instead.
Or the compiler might just warn about all implicit conversions that might
lose bits.

> With all apologies to Churchill, SCN/PRI are the worst way to address this in a machine-independent way, except for all the other ways that have been tried from time to time. :-P

No, there are much better ways.  For PRI*, just cast to [u]intmax_t.  The
only thing wrong with this is that it lots wastes space and time when
[u]intmax_t is very wide.  We don't have this problem yet since [u]intmax_t
is only 64 bits.  That would be very wide on 8-bit systems but is merely
wide on 32-bit systems and not wide on 63-bit systems.  PRI* is not even
available for an average typedef like uid_t.  It is available for all
"fast" and "least" typedefs.

For SCN*, first don't use *scanf().  Use APIs that can actually handle errors,
like strtol().  If you have to maintain bad code that uses *scanf(), then
convert to uintmax_t.  The conversions are not as short as casting for
*printf().  They are much like what is needed at ABI boundaries:

 	uintmax_t tmp;
 	uint64_t val;

 	/*
 	 * Good code does tmp = strtoumax() here.  Then handle errors
 	 * reported by strtoumax().  Then check that the result is
 	 * representable in uint64_t.  Then handle errors from that.
 	 *
 	 * Bad code did *sscanf(str, "%llx", &val) with no error checking
 	 * or handling of course.  The type didn't't match the format in
 	 * general, and the behaviour was undefined on overflow.
 	 */
 	sscanf(str, "%jx", &tmp);	/* keep null error handling */
 	val = tmp;			/* preserve undefined on overflow */
 	/*
 	 * Overflow in sscanf() is less likely than before (uintmax_t
 	 * might be larger).  Then overflow occurs much like before on
 	 * assignment (if the result is too large for the final type).
 	 */

Since the conversion is more elaborate for *scanf() than for *printf(),
SCN* is slightly less bad than PRI*.  SCN* is also not available for an
average type like uid_t.  It is available for the "fast" and "least"
typedefs.  But with error handling, it is even easier to always read
into a temporary variable of type [u]intmax_t so as to do range checking:

 	uintmax_t tmp;
 	uint_fast64_t fast_val;
 	uint_least64_t least_val;
 	unsigned long long abom_val;
 	uint64_t val;

 	errno = 0;			/* chummy with implementation */
 	r = sscanf(str, "%jx", &tmp);
 	if (r < 0 || errno != 0)	/* detect undef behaviour sometimes */
 		some_error();
 	if (tmp > UINT64_MAX)
 		range_error();
 	fast_val = tmp;			/* for local use */
 	least_val = tmp;		/* for local use */
 	abom_val = tmp;			/* for unspeakable use */
 	val = tmp;			/* for ABI use */

The value could have been read directly into any of the other variables
except uint64_t directly, with only portable SCN* ugliness, but it is
easier to use uintmax_t.

It is a small step from this to using correct code using strtoumax().
The SCN* mistake is not made for the strtol() family (by having a function
for every possible type).  But if you have to maintain bad code using
*scanf(), it might have 20 args instead of 1, or complications like %n
or string input (there is nothing like the strtol() family for convenient
parsing of strings).  Then converting to 20 copies of the above would be
painful (or course you would use a function).  The quick fix is to omit
the error handling and add 20 temorary variables and 20 assignments.

Grepping for examples in src/bin shows only 3 utilities using *scanf() and
1 serious type error:
- ls uses 1 sscanf() with bad error handling (no range checking, and a
   broken fall-through for < 0 (it is treated as all args present), and
   no errno hack.
- sleep uses 1 sscanf() rewritten be me to do all possible error handling
   short of using the errno hack.
- stty uses 4 sscanf()s with assorted type errors.  It always reads 1
   variable at a time into a single temporary variable of type long.
   This (or rather using strtol()) was almost correct before C99 broke
   long being longest.  It should have read into a u_long for the
   unsigned cases.  Instead, it bogusly casts &tmp from long * to
   u_long * and then depends on benign conversions to use tmp as a
   u_long.  Many of the uncast longs have sign errrors in a different
   way -- they are for speed_t, which happens to be u_int.

   stty's use shows most of the ABI complications mentioned above.
   The sscanf()s are 1 for variables of type tcflags_t, 2 for variables
   of type speed_t and 1 for variables of type cc_t.  All of these types
   are supposed to be opaque, except they are unsigned and integer.  So
   using long to read them was wrong on all cases.  The correct type was
   u_long in C90 and uintmax_t after C99 broke long being longest.  u_long
   still works in FreeBSD because ABIs don't allow easy expansion and there
   is no need for expanding these types.  These types are actually u_char
   and u_int, so they fit easily in u_long and more care should be taken
   to disallow reading values that don't fit in the final type.  All other
   errors are ignored anyway.

Bruce


More information about the svn-src-all mailing list