[From nobody Tue Aug 24 09:06:44 2010
Date: Fri, 20 Aug 2010 06:51:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
To: Dag-Erling =?utf-8?Q?Sm=C3=B8rgrav?= <des@des.no>
Subject: Re: svn commit: r211304 - head/lib/libutil
Message-ID: <20100820043717.O18794@delplex.bde.org>
References: <201008141434.o7EEYaSA030301@svn.freebsd.org>
<20100815213757.M14856@delplex.bde.org>
<20100815235422.K14924@delplex.bde.org>
<20100819034102.F17926@delplex.bde.org> <8639uadf7s.fsf@ds4.des.no>
MIME-Version: 1.0
>> The following places were even more broken to begin with:
>> geom/core/geom.c: this calls expand_number() on an intmax_t *
>> ipfw/dummynet.c: this calls expand_number() on an int64_t *, after starting
>> with a uint64_t * and bogusly casting to int64_t *.
>
> I'll fix these two, do you have further suggestions or a list of other
> instances that need fixing?
>
> BTW, would you prefer fixing geom.c to use uint64_t, or fixing
> expand_number() to use uintmax_t?
I prefer fixing expand_number() to support negative numbers again. It
should support them since it is supposed to be the inverse of
scientificize^Whumanize_number(), which can and does print them in
a useful way for at least free space in df. I don't know of any use
for them in input, but it simplifies at least the documentation to
have an exact inverse.
However, it should probably still return negative numbers via a
uintmax_t, like strtoumax() does, so that yet another BAD API isn't
required. Externally, this requires documenting what expand_number()
actually does, since it uses strtoumax() internally but currently says
nothing except for the false statement that it is the inverse of
scientifize_number(), plus it requires something to determine whether
the number is signed.
Internally, this requires a bit more care with shifting of negative numbers.
A negative number is by definition one with a leading minus sign.
strtoumax() will return these converted to large positive numbers, and
currently any shift of them will give overflow. Instead, they should
be converted to a sign and a positive value before shifting. Pseudocode:
num = strtoumax();
Return if error.
shift = mumble(); /* mult = mumble() (recursive) for dd */
Return if error.
Return if shift == 1 (this gives strtou*()'s/our historical arcane
error handling for negative values, but only for non-shifted ones).
snum = strtoimax();
Return if error (must be range error with snum < 0).
signed = (snum < 0);
if (signed)
num = -(uintmax_t)snum; /* XXX assumes normal machine */
cutlim = (signed ? INTMAX_MAX / ((intmax_t)1 << shift) :
UINTMAX_MAX / ((uintmax_t)1 << shift);
if (num > cutlim)
overflow();
else
num <<= shift;
if (signed)
num = -(intmax_t)num;
My dd get_off_t() gets this wrong, oops. Just using the unsigned version
fails for things like "-1k". The -1 becomes UINTMAX_MAX and any shifing
of this overflows (or if overflow is ignored, gives back -1 on normal
machines). -1k isn't useful in dd -- all that is needed is for things
like 0xFFFFFFFFFFFFFFFF (2**64-1) to work, which they don't if strtoimax()
is used -- with 64-bit intmax_t, it clips this value to 0x7FFFFFFFFFFFFFFF/
ERANGE.
Here is part of the man page for strtoumax() that gives details relevant
to expand_number():
% STRTOUL(3) FreeBSD Library Functions Manual STRTOUL(3)
Bug: there should be a separate man page for each function.
%
% NAME
% strtoul, strtoull, strtoumax, strtouq -- convert a string to an unsigned
% long, unsigned long long, uintmax_t, or u_quad_t integer
%
% LIBRARY
% Standard C Library (libc, -lc)
%
% SYNOPSIS
% #include <stdlib.h>
% #include <limits.h>
Bug: limits.h is not needed for using strtoul() or strtoull().
%
% unsigned long
% strtoul(const char * restrict nptr, char ** restrict endptr, int base);
%
% unsigned long long
% strtoull(const char * restrict nptr, char ** restrict endptr, int base);
%
% #include <inttypes.h>
Bug: not consistently buggy with the above -- no limits.h here.
%
% uintmax_t
% strtoumax(const char * restrict nptr, char ** restrict endptr, int base);
%
% #include <sys/types.h>
% #include <stdlib.h>
% #include <limits.h>
%
% u_quad_t
% strtouq(const char *nptr, char **endptr, int base);
Bugs:
1. sys/types.h is needed in theory but not in practice, since someone
changed the actual declaration of this in stdlib.h to use uint64_t.
2. stdlib.h says that this and strtoq() are to be removed in FreeBSD-6.0,.
%
% DESCRIPTION
% The strtoul() function converts the string in nptr to an unsigned long
% value. The strtoull() function converts the string in nptr to an
% unsigned long long value. The strtoumax() function converts the string
% in nptr to an uintmax_t value. The strtouq() function converts the
% string in nptr to a u_quad_t value. The conversion is done according to
% the given base, which must be between 2 and 36 inclusive, or be the spe-
% cial value 0.
%
% The string may begin with an arbitrary amount of white space (as deter-
% mined by isspace(3)) followed by a single optional `+' or `-' sign. If
% base is zero or 16, the string may then include a ``0x'' prefix, and the
% number will be read in base 16; otherwise, a zero base is taken as 10
% (decimal) unless the next character is `0', in which case it is taken as
% 8 (octal).
The detail about the base (0) and its prefixes, and the sign, are missing
in expand_number.3.
%
% The remainder of the string is converted to an unsigned long value in the
% obvious manner, stopping at the end of the string or at the first charac-
% ter that does not produce a valid digit in the given base.
More details missing.
% (In bases
% above 10, the letter `A' in either upper or lower case represents 10, `B'
% represents 11, and so forth, with `Z' representing 35.)
%
% If endptr is not NULL, strtoul() stores the address of the first invalid
Bug: this applies to all the functions, not just strtoul(), so all of them
or better none of them should be explicitly mentioned here.
% character in *endptr. If there were no digits at all, however, strtoul()
% stores the original value of nptr in *endptr. (Thus, if *nptr is not
% `\0' but **endptr is `\0' on return, the entire string was valid.)
Other details not relevant, but expand_number is missing the detail on the
error handling for trailing garbage.
%
% RETURN VALUES
% The strtoul(), strtoull(), strtoumax() and strtouq() functions return
% either the result of the conversion or, if there was a leading minus
% sign, the negation of the result of the conversion, unless the original
% (non-negated) value would overflow; in the latter case, strtoul() returns
% ULONG_MAX, strtoull() returns ULLONG_MAX, strtoumax() returns
% UINTMAX_MAX, and strtouq() returns ULLONG_MAX. In all cases, errno is
Here are the details for negative values and how strtou*() handle them
Bugs in expand_number.3, ones not already mentioned and a couple
mentioned again.
% EXPAND_NUMBER(3) FreeBSD Library Functions Manual EXPAND_NUMBER(3)
%
% NAME
% expand_number -- format a number from human readable form
Its bogus name is less than reflected in its bogus descrption.
"format a number from human readable form" is exactly the same description
as is given to scientificize_number(), where its only incorrectness is
that it misspells "scientificize" as "humanize". But this is supposed to
be the inverse function.
% DESCRIPTION
% The expand_number() function unformats the buf string and stores a
% unsigned 64-bit quantity at address pointed out by the num argument.
%
% The expand_number() function follows the SI power of two convention.
%
% The prefixes are:
Neither SI nor this function take prefixes. They take suffixes.
%
% Prefix Description Multiplier
Another bogus "prefix".
The bogus [bB] suffix is not mentioned here. Hmm, B would make sense
as a suffix to these suffixes (thus KB could be an alias for K, but
AFAIR scientificize number never produces either plain B or KB, and
the parser only recognizes a single-letter suffix).
% k kilo 1024
scientificize_number() only produces the K suffix, at least in df output.
% ERRORS
% The expand_number() function will fail if:
%
% [EINVAL] The given string contains no digits.
Not quite right. "foo1" contains a digit, but should give EINVAL.
%
% [EINVAL] An unrecognized prefix was given.
Another bogus "prefix". Only the undocumented octal and hex prefixes
are supported, but this "prefix" means "suffix" as usual.
To give an exact inverse of scientificize_number(), the API also needs
to be bug for bug compatible (int64_t not uintmax_t), with no support
for octal or hex (not too bad).
Gak, prefixes (sic) are much more broken than I said above. They are
also named prefixes in scientificize_number() sources, and there are
several variations, most of which are not supported by the "inverse"
function:
% int
% humanize_number(char *buf, size_t len, int64_t bytes,
% const char *suffix, int scale, int flags)
% {
% const char *prefixes, *sep;
The usual confusion.
% int b, i, r, maxscale, s1, s2, sign;
% int64_t divisor, max;
% size_t baselen;
Various style bugs.
%
% assert(buf != NULL);
% assert(suffix != NULL);
% assert(scale >= 0);
%
% if (flags & HN_DIVISOR_1000) {
% /* SI for decimal multiplies */
mulipliers
% divisor = 1000;
% if (flags & HN_B)
% prefixes = "B\0k\0M\0G\0T\0P\0E";
% else
% prefixes = "\0\0k\0M\0G\0T\0P\0E";
Classic SI prefixes, in which k means 1000 (is B really an SI prefix?).
expand_number() has no support for these.
% } else {
% /*
% * binary multiplies
% * XXX IEC 60027-2 recommends Ki, Mi, Gi...
Blech.
% */
% divisor = 1024;
% if (flags & HN_B)
% prefixes = "B\0K\0M\0G\0T\0P\0E";
% else
% prefixes = "\0\0K\0M\0G\0T\0P\0E";
K seems to be the only letter that is non-ambiguous here.
% }
%
% #define SCALE2PREFIX(scale) (&prefixes[(scale) << 1])
scientificize_number.3 duplicates some of the confusion:
% DESCRIPTION
% The humanize_number() function formats the signed 64-bit quantity given
% in number into buffer. A space and then suffix is appended to the end.
% The buffer pointed to by buffer must be at least len bytes long.
%
% If the formatted number (including suffix) would be too long to fit into
% buffer, then divide number by 1024 until it will. In this case, prefix
% suffix with the appropriate SI designator. The humanize_number() func-
% tion follows the traditional computer science conventions rather than the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
% proposed SI power of two convention.
This paragaph hasn't caught up with the extension HN_DIVISOR_1000.
%
% The prefixes are:
The usual confusion.
%
% Prefix Description Multiplier Multiplier 1000x
% k kilo 1024 1000
% M mega 1048576 1000000
% G giga 1073741824 1000000000
% T tera 1099511627776 1000000000000
% P peta 1125899906842624 1000000000000000
% E exa 1152921504606846976 1000000000000000000
The "Multiplier 1000x" column covers the extension.
Even a scientist wouldn't write numbers as 1152921504606846976 or
1000000000000000000. They would use exponential notation.
%
% The len argument must be at least 4 plus the length of suffix, in order
% to ensure a useful result is generated into buffer. To use a specific
% prefix, specify this as scale (multiplier = 1024 ^ scale). This cannot
Not with HN_DIVISOR_1000.
% HN_B Use `B' (bytes) as prefix if the original result
% does not have a prefix.
Now I see some B's (for 0B) in df output.
Conclusions:
- although I want a general number parser, expand_number() cannot
be it since it needs to be the inverse of scientificize_number()
which has too many warts to be inverted by a general number parser
(short of one with billions of options).
- negative numbers are useful after all, and many things become simpler
by restricting to them at all levels. We lose the abilitity to
"expand" UINTMAX_MAX, but scientificize() number never had that anyway.
Bruce
]