svn commit: r254600 - head/lib/libutil

Thu Aug 22 10:51:31 UTC 2013

On Wed, 21 Aug 2013, Jilles Tjoelker wrote:

> On Thu, Aug 22, 2013 at 01:24:13AM +0400, Sergey Kandaurov wrote:
>> On Wed, Aug 21, 2013 at 10:27:25PM +0200, Jilles Tjoelker wrote:
>>> On Wed, Aug 21, 2013 at 11:03:10PM +0400, Sergey Kandaurov wrote:
>>>> On Wed, Aug 21, 2013 at 09:21:47PM +0400, Andrey Chernov wrote:
>>>>> On 21.08.2013 20:46, Sergey Kandaurov wrote:
>>>>>>  	number = strtoumax(buf, &endptr, 0);
>>>>>>
>>>>>> +	if (number == UINTMAX_MAX && errno == ERANGE) {
>>>>>> +		return (-1);
>>>>>> +	}
>
>>>>> You need to reset errno before strtoumax() call (errno = 0), because any
>>>>> of previous functions may left it as ERANGE.

This also has a bogus test for UINTMAX_MAX wheich prevents input of
the number UINTMAX_MAX on systems where the limit for the function
(UINT64_T_MAX) is the same as the limit for strtoumax().  This test is
a wrong way of doing the errno thing.

This also has a high density of style bugs:
- excessive braces
- extra blank line

expand_number() remains a very badly designed and implemented function.
Its design errors start with its name.  It doesn't expand numbers.  It
converts strings to numbers.  The strtoumax() family is missing this
bug and others.  Its design errors continue with it taking a uint64_t
arg and not a uint64_t arg.

Its implementation errors begin with broken range checking.  The above
is an incomplete fix for this.  Unless uintmax_t is the same as uint64_t,
there are still the following bugs:
- the return value of strotumax() is assigned to a uint64_t.  This may
   overflow.  After overflow, the bogus (number == UINTMAX_MAX) check
   has no effect, since when overflow is possible 'number' cannot equal
   UINTMAX_MAX.
- values that exceed the maximum supported (UINT64_MAX) are not detected.
   They are silently truncated to uint64_t.

>>>> Thanks for pointing out.
>>>> Does the patch look good?

No.  It adds many more style bugs...

>>>> Index: expand_number.c
>>>> ===================================================================
>>>> --- expand_number.c     (revision 254600)
>>>> +++ expand_number.c     (working copy)
>>>> @@ -53,6 +53,8 @@
>>>>         unsigned shift;
>>>>         char *endptr;
>>>>
>>>> +       errno = 0;
>>>> +

Another extra blank line.  It is worse than most, since it separates the
initialization for calling stroumax() from the call.

>>>>         number = strtoumax(buf, &endptr, 0);
>>>>

This extra blank line separates the call from the error checking for the
function.  It was added in the commit that started this thread.  The
previous version had a bogus extra blank line to separate the result of
the error checking from the call.

>>>>         if (number == UINTMAX_MAX && errno == ERANGE) {
>
>>> This may cause the function to set errno=0 if it is successful, which is
>>> not allowed for standard library functions from C and POSIX. There may
>>> be a problem not only if expand_number() is standardized but also if it
>>> is used in the implementation of a standard library function. The best
>>> solution is to save and restore errno around this (if [ERANGE] is
>>> detected, that is a valid errno value to keep).

This is painful, but seems to be necessary.

>>> In an application it is acceptable to set errno=0 without further ado.
>
>> What about this change?
>> It changes errno only if it was modified from zero in strtoumax().
>> Unconditionally restoring errno after strtoumax() unless ERANGE is
>> probably not good as strtoumax() might set different errno (e.g. EINVAL)
>> and we might want to keep it as well.  Please correct me, if I'm wrong.
>
>> Index: lib/libutil/expand_number.c
>> ===================================================================
>> --- lib/libutil/expand_number.c (revision 254600)
>> +++ lib/libutil/expand_number.c (working copy)
>> @@ -50,15 +50,22 @@
>>  expand_number(const char *buf, uint64_t *num)
>>  {
>>         uint64_t number;
>> +       int saved_errno;

Style bugs:
- declarations more unsorted than before.  Previously, only endptr was
   unsorted.
- the variable for this is normally names serrno.  libutil mostly follows
   this convention -- it has several serrno's and no saved_errno's, but it
   does have some error's (error = errno; ...), mainly in pidfile.c.  Only
   login_cap.c is notkkk
   careful about clobbering errno to 0.

>>         unsigned shift;
>>         char *endptr;
>>
>> +       saved_errno = errno;
>> +       errno = 0;
>> +
>>         number = strtoumax(buf, &endptr, 0);
>>
>>         if (number == UINTMAX_MAX && errno == ERANGE) {
>>                 return (-1);
>>         }

Style bugs, as above.

>>
>> +       if (errno == 0)
>> +               errno = saved_errno;
>> +

Yet another style bug.

>>         if (endptr == buf) {
>>                 /* No valid digits. */
>>                 errno = EINVAL;
>>

Another style bug is the placement of the error checking.  Normally the
endptr check is done before the range check.

>
> This looks good to me.

This looks not so good to me.

Some of the other bugs in the old version:

% /*
%  * Convert an expression of the following forms to a uint64_t.
%  *	1) A positive decimal number.
%  *	2) A positive decimal number followed by a 'b' or 'B' (mult by 1).
%  *	3) A positive decimal number followed by a 'k' or 'K' (mult by 1 << 10).
%  *	4) A positive decimal number followed by a 'm' or 'M' (mult by 1 << 20).
%  *	5) A positive decimal number followed by a 'g' or 'G' (mult by 1 << 30).
%  *	6) A positive decimal number followed by a 't' or 'T' (mult by 1 << 40).
%  *	7) A positive decimal number followed by a 'p' or 'P' (mult by 1 << 50).
%  *	8) A positive decimal number followed by a 'e' or 'E' (mult by 1 << 60).

This garbage was copied from a similar comment in dd/args.c.  The number is
actually neither positive nor decimal, especially in dd where there is a
variant of this function that returns negative numbers.  Another design error
in this function is that it can't return negative numbers.

%  */
%

This bogus blank line detaches the comment from the code that it describes,
resulting in the comment describing null code.

% int
% expand_number(const char *buf, uint64_t *num)
% {
% 	uint64_t number;
% 	unsigned shift;
% 	char *endptr;
% 
% 	number = strtoumax(buf, &endptr, 0);

This supports non-positive non-decimal numbers.  The support for non-decimal
numbers is more intentional (it is given by the base 0 in the call).  Negative
numbers are only supported by strtoumax() allowing a minus sign and
returning a nonnegative number equal to 1 larger than UINTMAX_MAX less the
number without the minus sign.

% 
% 	if (endptr == buf) {
% 		/* No valid digits. */
% 		errno = EINVAL;
% 		return (-1);
% 	}

Most of the bugs except ones in the comment are missing in dd/args.c:

% static uintmax_t
% get_num(const char *val)
% {
% 	uintmax_t num, mult, prevnum;

By using variables of the same type that strtoumax() returns, it avoids
overflow bugs in the overflow checking...

% 	char *expr;
% 
% 	errno = 0;

This was fixed 14 years ago in dd.  Perhaps earlier in my version.  For
variations on the bugs, see the 4.4Lite version.  It was missing this
errno setting.  Since this is an application, it can clobber errno like
this.

% 	num = strtouq(val, &expr, 0);

This is still broken.  It unnecessarily assumes that uquad_t = uintmax_t.
4.4BSD-Lite uses strtoul() here, and using strtouq() is a wrong fix.

% 	if (errno != 0)				/* Overflow or underflow. */
% 		err(1, "%s", oper);

The error handling is simpler since this is not a library function.
The comment about underflow is wrong.  Underflow can't happen.  What
the comment means by underflow is "overflow, with a negative sign".
4.4BSD-Lite was missing this bug.  But 4.4BSD-Lite had a wrong range
check (num == ULONG_MAX, essentially the same as expand_number() now).

%

4.4BSD-Lite was missing this bug (extra blank line).

% 	if (expr == val)			/* No valid digits. */
% 		errx(1, "%s: illegal numeric value", oper);

- this code, with its more specific error message, was broken by a POSIX
   bug.  It is now unreachable (EINVAL is returned if no number is
   found, and for some other cases like an invalid base which can't
   happen for the above call).  It is not bad to break it becauses its
   message is of low quality in a different way than the strerror(errno)
   message from err().  There are no lawyers here.  strerror(EINVAL) is
   just too generic.

% [...]

dd's function is very careful to avoid overflow in shifts and multiplications.

expand_number() just allows things to overflow.  Since the variables are
unsigned, overflow just causes garbage results.  Not good enough for a
library function.

dd's function also supports multipliers.  The overflow avoidance is more
complication for multiplication than for shifting.  Multipliers would be
useful in a library function too.  expand_number() doesn't support them.
Another of its design errors is that it does less than dd's functions, so
it cannot be used to replace them.

% }
% 
% /*
%  * Convert an expression of the following forms to an off_t.  This is the
%  * same as get_num(), but it uses signed numbers.
%  *
%  * The major problem here is that an off_t may not necessarily be a intmax_t.
%  */
% static off_t
% get_off_t(const char *val)
% {
% 	intmax_t num, mult, prevnum;
% 	char *expr;
% 
% 	errno = 0;
% 	num = strtoq(val, &expr, 0);
% 	if (errno != 0)				/* Overflow or underflow. */
% 		err(1, "%s", oper);
% 
% 	if (expr == val)			/* No valid digits. */
% 		errx(1, "%s: illegal numeric value", oper);
% ...
% }

dd also has this misimplemented function.  It is supposed to fix the
problem that get_num() only returns unsigned values.  But it is
misimplemented by making a copy of get_num() and changing some unsigned
types to signed types.  The result is mostly worse than casting the
result of get_num().  The careful overflow checking no longer works,
since it is actually not so careful and does things like checking for
overflow after overflow has already occurred.  Overflow already occurring
gives underfined behaviour with signed types.  So the duplicated function
differs from just casting the result of get_num() as follows:
- it gives undefined behaviour on overflow
- strtoq() clamps to QUAD_MAX_/MIN where strtouq() clamps to UQUAD_MAX/MIN().
   This mainly amplifies the bugs seeking to large offsets.  The kernel
   supports 64-bit unsigned offsets for some defice files and these are needed
   for /dev/kmem.  But QUAD_MAX is smaller than the necessary offsets, so
   oseek and iseek in bytes are broken.  You have to trick dd into working
   by using non-byte offsets.  Some tricks (bogus casts/conversions in various
   places) are needed anyway since signed 64-bit off_t can't hold the necessary
   offsets.
The function is still missing checks that quad_t is not too large for offsets.

My fix for get_off_t() is to just make it (off_t)get_num() with a check that
the result is actually representable as an off_t.  This only depends on
2's complement magic.

To start fixing expand_number():

/* Fix comment here. */
int
expand_number(const char *buf, uint64_t *num)
{
 	char *endptr;
 	uintmax_t num;
 	uint64_t number;
 	unsigned shift;
 	int serrno;

 	serrno = errno;
 	errno = 0;
 	num = strtoumax(buf, &endptr, 0);
 	if (num > UINT64_MAX)
 		errno = ERANGE;
 	if (errno != 0)
 		return (-1);
 	errno = serrno;
 	number = num;
 	...
}

This depends on the POSIX bug for detecting the no-digits case.
Fortunately, the early error handling of this function is simple and
compatible with that of strtoumax(), so we don't need to translate
strtoumax()'s errors except for adjusting its range error.

Similarly in dd:

/* Fix comment here. */
static uintmax_t
get_num(const char *val)
{
 	char *expr;
 	uintmax_t num, mult, prevnum;

 	errno = 0;
 	num = strtoumax(val, &expr, 0);
 	if (errno != 0)
 		err(1, "%s", oper);
 	...
}

/*
  * Convert an expression of the above form to an off_t.  This version does
  * not handle negative numbers perfectly.  It assumes 2's complement and
  * maybe nonnegative multipliers.  I hope perfect handling is not necessary
  * for dd.
  */
static off_t
get_off_t(const char *val)
{
 	uintmax_t num;

 	num = get_num(val);
 	if (num != (u_quad_t)(off_t)num)
 		errx(1, "%s: offset too large", oper);
 	return ((off_t)num);
}

Bruce