svn commit: r254600 - head/lib/libutil

Thu Aug 29 13:53:41 UTC 2013

On Thu, 29 Aug 2013, Sergey Kandaurov wrote:

> On 22 August 2013 14:51, Bruce Evans <brde at optusnet.com.au> wrote:
>> expand_number() remains a very badly designed and implemented function.
>> Its design errors start with its name.  It doesn't expand numbers.  It
>> ...
> [...]
>>
>> Some of the other bugs in the old version:
>>
>> % /*
>> %  * Convert an expression of the following forms to a uint64_t.
>> %  *    1) A positive decimal number.
>> %  *    2) A positive decimal number followed by a 'b' or 'B' (mult by 1).
>> %  ...
>> This garbage was copied from a similar comment in dd/args.c.  The number is
>> actually neither positive nor decimal, especially in dd where there is a
>> variant of this function that returns negative numbers.  Another design
>> error
>> in this function is that it can't return negative numbers.
>
> Well, I see no useful contents in this comment. It is obvious and verbose.
> Probably it should be trimmed.

Stating the format in the man page is enough.

I don't like the man page much either, and have complained about it and
dehumanize_number(3) at length elsewhere.  But expand_number() is much
simpler, so most of the bugs mostly affect the other man page.  (It is
too simple to be able to translate numbers printed by dehumanize_number(),
since it doesn't support power of 10 "prefixes".)  Looking at
expand_number(3) now, I see only the following bugs:
- the format of the string isn't really documented
- poor grammar:
   - "the buf string"
   - "the address pointed out"
- confusing terminology "prefix".  This means an "SI power of 2" prefix,
   e.g., "kilo" or "k".  (I think SI only really has power if 10 prefixes.
   Anyway, "k" should always mean 1000 and "K" always 1024.  This is
   backwards in the man page.)  "kilo" really is a prefix when it modifies
   a unit, "e.g.", kilobytes.  But this function deals with pure numbers,
   so there are no units, and the "prefixes" are always uses as suffixes,
   e.g., "10K" (sic).
- the code supports both cases for all "prefixes", with literal lower
   case, but the man page only documents one case (always upper case,
   except for "k" where this is backwards).  Comments in the code
   describes the "prefixes" as "units".  That is better, but is too
   confusing since there may also be units like bytes added later.
- the man page describes the multipliers in non-dehumanized form, i.e.,
   as decimal numbers.  Even I prefer a dehumanized form here, since I
   can only recognize small decimal numbers as powers of 2.  The man page
   should give the numbers in hex or powers of 2 first and then in decimal
   if there is space.
- in the ERRORS section:
   - the description of EINVAL is incomplete.  See strtoumax() for more.
   - the description of ERANGE has poor grammar.  It should start with
     "The", as os normal and as is done for EINVAL, and perhaps give
     more details, as in strto*().

I think most utilities and all library functions that parse numbers should
say when they use strto*() to do this.  Utilities should mostly be
covered by intro(1).  The default should be to accept numbers in any
base.  POSIX unfortunately doesn't require very much, so bases other than
10 are unportable at best.  Grepping in utitilies man pages shows the 
following:
- I patched mknod(8) with the following when I fixed its strto*() parsing:

% Index: mknod.8
%  ===================================================================
%  RCS file: /home/ncvs/src/sbin/mknod/mknod.8,v
%  retrieving revision 1.3
% retrieving revision 1.4
% diff -r1.3 -r1.4
% 94,95c94,96
% < Major and minor device numbers can be given using the normal C
% < notation, so that a leading
% ---
% > Major and minor device numbers can be given in any format acceptable to
% > .Xr strtoul 3 ,
% > so that a leading

This is still fuzzy since it doesn't say that any base is allowed.

- sleep(1) refers to strtod(3).  sleep(1) actually uses sscanf() instead
   of strtod(3), but I think the reference is correct.
- ctladm(8) refers to strtoull(3) for lba's, with details about the base
   (but expressed too verbosely, and duplicated for read and write).  dd
   needs a similar amount of detail and more, but has less.  ctladm(3)
   does actually use strtoull(), but that is another bug since on 32-bit
   machines strtoull() only gives 32-bit numbers but lba's need
   more.
- etherswitchcfg(8) refers to strtol(4) for phy's and gives details about
   accepting any base.  strtol(4) of course doesn't exist.  It is a typo
   for strtol(3).

>
>>
>> %  */
>> %
>>
>> This bogus blank line detaches the comment from the code that it describes,
>> resulting in the comment describing null code.
>
> Sorry, I didn't see such blank line in head.

Oops, false alarm.

> ...
> Fixing the verbose comment above would result in adding more verboseness
> citing from man strtoumax(3) wrt. signedness and base details.
> Yet another point to trim it.

Yes.  We add the detail that the number is restricted to 64 bits (else
ERANGE), but already document that well enough (in the man page), and
after these fixes the man page should match the code.

> Thanks for your valuable input. What about this change (on top of head)?
>
> Index: expand_number.c
> ===================================================================
> --- expand_number.c    (revision 255017)
> +++ expand_number.c    (working copy)
> @@ -35,42 +35,29 @@
> #include <libutil.h>
> #include <stdint.h>
>
> -/*
> - * Convert an expression of the following forms to a uint64_t.
> - *    1) A positive decimal number.
> - *    2) A positive decimal number followed by a 'b' or 'B' (mult by 1).
> - *    3) A positive decimal number followed by a 'k' or 'K' (mult by 1 << 10).
> - *    4) A positive decimal number followed by a 'm' or 'M' (mult by 1 << 20).
> - *    5) A positive decimal number followed by a 'g' or 'G' (mult by 1 << 30).
> - *    6) A positive decimal number followed by a 't' or 'T' (mult by 1 << 40).
> - *    7) A positive decimal number followed by a 'p' or 'P' (mult by 1 << 50).
> - *    8) A positive decimal number followed by a 'e' or 'E' (mult by 1 << 60).
> - */
> int
> expand_number(const char *buf, uint64_t *num)
> {
> +    char *endptr;
> +    uintmax_t umaxval;
>     uint64_t number;
> -    int saved_errno;
>     unsigned shift;
> -    char *endptr;
> +    int serrno;
>
> -    saved_errno = errno;
> +    serrno = errno;
>     errno = 0;
> -
> -    number = strtoumax(buf, &endptr, 0);
> -
> -    if (number == UINTMAX_MAX && errno == ERANGE) {
> -        return (-1);
> -    }
> -
> -    if (errno == 0)
> -        errno = saved_errno;
> -
> +    umaxval = strtoumax(buf, &endptr, 0);
>     if (endptr == buf) {
>         /* No valid digits. */
>         errno = EINVAL;
>         return (-1);
>     }

I want to remove this if clause (depend on the POSIX extension).

> +    if (umaxval > UINT64_MAX)
> +        errno = ERANGE;
> +    if (errno != 0)
> +        return (-1);
> +    errno = serrno;
> +    number = umaxval;
>
>     switch (tolower((unsigned char)*endptr)) {
>     case 'e':

Seems OK.  Tabs got corrupted in the mail.

I complained about missing overflow checking for the shift, but there is
some.  It seems to be correct except for a style bug (a blank line that
separates the check from the final evaluation.  We avoid changing *num
when failing, and don't use a temporary variable for holding the result
of the shift, so the shift expression is written twice, but that is a
negative reason to separate its evaluations).

Bruce