bin/74567: [patch] [2TB] du doesn't handle sizes >1TB
Bruce Evans
bde at zeta.org.au
Fri Dec 3 03:10:29 PST 2004
The following reply was made to PR bin/74567; it has been noted by GNATS.
From: Bruce Evans <bde at zeta.org.au>
To: Sergey Salnikov <serg at www1.citforum.ru>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: bin/74567: [patch] [2TB] du doesn't handle sizes >1TB
Date: Fri, 3 Dec 2004 22:04:25 +1100 (EST)
On Thu, 2 Dec 2004, Sergey Salnikov wrote:
> >It seems like off_t would be more appropriate.
> Really I simply looked at the "prthumanval(int64_t bytes)" line and
> thought that using the same typedef everywhere would be the right
> thing. But off_t is better for that, you're right.
No, off_t is worse for that. off_t is a signed integer integer type
suitable for representing file offsets. It must be signed so that it
can represent negative offsets and have a range twice as large as is
needed for file sizes. Using it to represent file sizes would be bogus
since file sizes aren't offsets but would always work since off_t must
be able to represent the offset from the beginning of a file to the
end of the same file. Using it to represent sums of file sizes sizs
would be more bogus since sums may be much larger than their terms;
in particular, it would break on systems where off_t is 32 bits but
file systems are larger than 2^31-1 bytes. Using it in du would be
even more bogus since du needs to represent sums of block counts, not
sums of file sizes.
POSIX.1-2001 requires the existence of a signed integer type blkcnt_t
which is "Used for file block counts". It must be signed for backwards
compatibility. This is still not implemented in FreeBSD. It is fuzzily
specified in POSIX. The spec doesn't require it to be usable for block
counts, but it requires st_blocks in struct stat to have type blkcnt_t
so blkcnt_t needs to be able to represent all file block counts for
stat() to actually work. It is fairly clear that it is not intended
to to work for more than single files.
POSIX.1-2001 requires the existence of an unsigned integer type fsblkcnt_t
which is "Used for file system block counts". This is implemented in
FreeBSD but not used in the main place where something like it is needed
(statfs()). It must be unsigned for backwards compatibility with SYSV,
but this makes it unsuitable for use in FreeBSD since file system block
counts can be negative (for f_bavail). statvfs() is unsuitable for the
same reason. Some BSDs have deprecated statfs() in favour of statvfs()
despite the latter's unsuitability, but statfs() is still the primary
interface that reports block counts in FreeBSD. statfs() avoids some
bugs by not using fsblkcnt_t, but has its own sign extension bugs from
excessive use of unsigned types.
Using fsblkcnt_t in du would be bogus since it is not intended to work
for more than single file systems.
Using a type related to block counts for fts_number would be bogus because
fts_number needs to represent generic values. fts_number needs to have a
signed arithmetic type since generic values may be negative. Using long
for it was almost correct until the C standard broke the promise that
long is the largest integer type. Using double would have been better,
at least on machines with IEEE floating point so that exact representability
of all integers between about -2^53 and 2^53 is guaranteed. Now that the C
standard promises that intmax_t is the largest signed integer type, intmax_t
is almost the correct type for fts_number. Using double would still be
safer since it has a much larger range (though less precision for large
values). intmax_t goes up to about 2^63 which should be enough for anyone,
but 2^53 should also be enough for anyone. The reason to use doubles is
that bugs may cause preposterously large values (> 2^63) and floating
point won't hide the bugs by truncating mod 2^63. In theory, overflow
can be detected for signed integer arithmetic, but no one ever turns on
the overflow detection and excessive use of unsigned types defeats it.
> A new patch follows.
> ...
>
> --- du-2TB-patch-v2 begins here ---
> --- du.c.orig Wed Jul 28 20:03:12 2004
> +++ du.c Thu Dec 2 00:54:34 2004
> @@ -72,7 +72,7 @@
>
> static int linkchk(FTSENT *);
> static void usage(void);
> -void prthumanval(int64_t);
> +void prthumanval(off_t);
Using int64_t in prthumanval() is as correct as possible, since
prthumanva() is just a wrapper for humanize_number() and humanize_number()
takes an int64_t due to previous poor choice of types.
> [... style bugs deleted ...]
> - (void) printf("%ld\t%s\n",
> - howmany(p->fts_number, blocksize),
> + (void) printf("%lld\t%s\n",
> + howmany(*(off_t *)p->fts_pointer,
> blocksize),
> p->fts_path);
%lld is for printing long long's. Using of for printing foo_t's is
logically wrong and potentially gives runtime errors. This bug is
detected at compile time on FreeBSD's 64-bit arches because both int64_t
and off_t are plain long (!= long long) on these arches.
Another reason to always replace long by intmax_t in changes of this type
is that by getting it right the first time you don't have to change lots
of printfs several times. FreeBSD already has too many %qd's and %lld's
from intermediate changes before intmax_t became standard. In many cases,
compile-time detection of the type being printed being different from
quad_t or long long, respectively, is defeated by casting the arg to
match the format.
Bruce
More information about the freebsd-bugs
mailing list