bin/74567: [patch] [2TB] du doesn't handle sizes >1TB

Bruce Evans bde at zeta.org.au
Fri Dec 3 03:10:29 PST 2004


The following reply was made to PR bin/74567; it has been noted by GNATS.

From: Bruce Evans <bde at zeta.org.au>
To: Sergey Salnikov <serg at www1.citforum.ru>
Cc: freebsd-gnats-submit at freebsd.org
Subject: Re: bin/74567: [patch] [2TB] du doesn't handle sizes >1TB
Date: Fri, 3 Dec 2004 22:04:25 +1100 (EST)

 On Thu, 2 Dec 2004, Sergey Salnikov wrote:
 
 >  >It seems like off_t would be more appropriate.
 >  Really I simply looked at the "prthumanval(int64_t bytes)" line and
 >  thought that using the same typedef everywhere would be the right
 >  thing. But off_t is better for that, you're right.
 
 No, off_t is worse for that.  off_t is a signed integer integer type
 suitable for representing file offsets.  It must be signed so that it
 can represent negative offsets and have a range twice as large as is
 needed for file sizes.  Using it to represent file sizes would be bogus
 since file sizes aren't offsets but would always work since off_t must
 be able to represent the offset from the beginning of a file to the
 end of the same file.  Using it to represent sums of file sizes sizs
 would be more bogus since sums may be much larger than their terms;
 in particular, it would break on systems where off_t is 32 bits but
 file systems are larger than 2^31-1 bytes.  Using it in du would be
 even more bogus since du needs to represent sums of block counts, not
 sums of file sizes.
 
 POSIX.1-2001 requires the existence of a signed integer type blkcnt_t
 which is "Used for file block counts".  It must be signed for backwards
 compatibility.  This is still not implemented in FreeBSD.  It is fuzzily
 specified in POSIX.  The spec doesn't require it to be usable for block
 counts, but it requires st_blocks in struct stat to have type blkcnt_t
 so blkcnt_t needs to be able to represent all file block counts for
 stat() to actually work.  It is fairly clear that it is not intended
 to to work for more than single files.
 
 POSIX.1-2001 requires the existence of an unsigned integer type fsblkcnt_t
 which is "Used for file system block counts".  This is implemented in
 FreeBSD but not used in the main place where something like it is needed
 (statfs()).  It must be unsigned for backwards compatibility with SYSV,
 but this makes it unsuitable for use in FreeBSD since file system block
 counts can be negative (for f_bavail).  statvfs() is unsuitable for the
 same reason.  Some BSDs have deprecated statfs() in favour of statvfs()
 despite the latter's unsuitability, but statfs() is still the primary
 interface that reports block counts in FreeBSD.  statfs() avoids some
 bugs by not using fsblkcnt_t, but has its own sign extension bugs from
 excessive use of unsigned types.
 
 Using fsblkcnt_t in du would be bogus since it is not intended to work
 for more than single file systems.
 
 Using a type related to block counts for fts_number would be bogus because
 fts_number needs to represent generic values.  fts_number needs to have a
 signed arithmetic type since generic values may be negative.  Using long
 for it was almost correct until the C standard broke the promise that
 long is the largest integer type.  Using double would have been better,
 at least on machines with IEEE floating point so that exact representability
 of all integers between about -2^53 and 2^53 is guaranteed.  Now that the C
 standard promises that intmax_t is the largest signed integer type, intmax_t
 is almost the correct type for fts_number.  Using double would still be
 safer since it has a much larger range (though less precision for large
 values).  intmax_t goes up to about 2^63 which should be enough for anyone,
 but 2^53 should also be enough for anyone.  The reason to use doubles is
 that bugs may cause preposterously large values (> 2^63) and floating
 point won't hide the bugs by truncating mod 2^63.  In theory, overflow
 can be detected for signed integer arithmetic, but no one ever turns on
 the overflow detection and excessive use of unsigned types defeats it.
 
 >  A new patch follows.
 >  ...
 >
 >  --- du-2TB-patch-v2 begins here ---
 >  --- du.c.orig   Wed Jul 28 20:03:12 2004
 >  +++ du.c        Thu Dec  2 00:54:34 2004
 >  @@ -72,7 +72,7 @@
 >
 >   static int     linkchk(FTSENT *);
 >   static void    usage(void);
 >  -void           prthumanval(int64_t);
 >  +void           prthumanval(off_t);
 
 Using int64_t in prthumanval() is as correct as possible, since
 prthumanva() is just a wrapper for humanize_number() and humanize_number()
 takes an int64_t due to previous poor choice of types.
 
 >  [... style bugs deleted ...]
 >  -                                       (void) printf("%ld\t%s\n",
 >  -                                           howmany(p->fts_number, blocksize),
 >  +                                       (void) printf("%lld\t%s\n",
 >  +                                           howmany(*(off_t *)p->fts_pointer,
 >  blocksize),
 >                                              p->fts_path);
 
 %lld is for printing long long's.  Using of for printing foo_t's is
 logically wrong and potentially gives runtime errors.  This bug is
 detected at compile time on FreeBSD's 64-bit arches because both int64_t
 and off_t are plain long (!= long long) on these arches.
 
 Another reason to always replace long by intmax_t in changes of this type
 is that by getting it right the first time you don't have to change lots
 of printfs several times.  FreeBSD already has too many %qd's and %lld's
 from intermediate changes before intmax_t became standard.  In many cases,
 compile-time detection of the type being printed being different from
 quad_t or long long, respectively, is defeated by casting the arg to
 match the format.
 
 Bruce


More information about the freebsd-bugs mailing list