getbsize(3): Convert blocksizep to be unsigned long?

Tue Jun 6 18:05:44 PDT 2006

On Tue, 6 Jun 2006, Xin LI wrote:

> When I was twiddling df(1)'s code I found that getbsize(3) accepts
> blocksizep as long *.  Because that the manual page says:
>
> "The memory referenced by blocksizep is filled in with block size, in
> bytes."
>
> I think it makes no sense for the number to be negative.  Is it
> reasonable to apply the attached patch?

No.  This mistake has been committed and backed out once too often
already (in 2002).  The previous interface chance was from "long *"
to "size_t *".  This was a larger change since it also increased the
size of the pointed-to object on 64-bit machines, thus changing the
ABI very unnecessarily.

Using unsigned values gives sign extension bugs and breaks overflow
checks that can be done in principle for sign values but not unsigned
ones.  df and statfs() are good examples of the sign extension bugs.
statfs() used to supply correctly signed types (i.e., only signed
ones), and naive users of statfs() like df still depend on this, but
statfs() now supplies a mixture of signed and unsigned types, giving
sign extension bugs or requiring lots of casts to avoid them when the
types are mixed or unsigned values are subtracted.  getbsize() still
supports a correctly signed type.

I use the following fix for the type error in df.c:

%%%
Index: df.c
===================================================================
RCS file: /home/ncvs/src/bin/df/df.c,v
retrieving revision 1.62
diff -u -2 -r1.62 df.c

--- df.c	4 Jun 2004 09:30:51 -0000	1.62
+++ df.c	4 Jun 2004 10:15:16 -0000
@@ -363,13 +364,14 @@
  prtstat(struct statfs *sfsp, struct maxwidths *mwp)
  {
-	static u_long blocksize;
-	static int headerlen, timesthrough = 0;
-	static const char *header;
-	int64_t used, availblks, inodes;
+	static long blocksize;
+	static int timesthrough;
+	const char *header;
+	int64_t availblks, inodes, used;
+	int headerlen;

  	if (++timesthrough == 1) {
  		mwp->mntfrom = imax(mwp->mntfrom, (int)strlen("Filesystem"));
  		if (hflag) {
-			header = "   Size";
+			header = "  Size";
  			mwp->total = mwp->used = mwp->avail =
  			    (int)strlen(header);
@@ -440,5 +442,5 @@
  update_maxwidths(struct maxwidths *mwp, const struct statfs *sfsp)
  {
-	static u_long blocksize = 0;
+	static long blocksize;
  	int dummy;

%%%

The patch also fixes some style bugs and has part of a fix for a bug
in the output formatting.

"long blocksize" is now passed to fsbtoblk() which expects "u_long
bs".  The interface to fsbtoblk() should be changed too, but there is
no actual problem due to the block size easily fitting in a signed
long and fsbtoblk() defending against sign botches internally:

% static intmax_t
% fsbtoblk(int64_t num, uint64_t fsbs, u_long bs)
% {
% 
% 	if (fsbs != 0 && fsbs < bs)
% 		return (num / (intmax_t)(bs / fsbs));
% 	else
% 		return (num * (intmax_t)(fsbs / bs));
% }

Note that `num' needs to be signed here since negative block counts can
happen for f_bavail, and it is actually signed.  It also needs to be
divided by a signed value (with a positive sign) so that it doesn't
get promoted to a huge unsigned value when numerator is a small signed
value and the type of the denominator is unsigned and longer than the
type of the numerator.  This was broken in rev.1.52 when the type of
fsbs (f_bsize in struct statfs) was broken from long to uint64_t.
Rev.1.52 also broken type type for bs (blocksize) from long to u_long,
but since u_long is smaller than uint64_t on all supported arches,
this made no additional difference to the type pf (bs / fsbs) or
(fsbsd / bss) -- that type is always uint64_t.  Rev.1.56 works around
these sign extension bugs by casting the denominator to int64_t.

Rev.1.62 cleans up or down the interface of fsbtoblk() a little.  Before
rev.1.52, was a macro that operated only on signed types and just worked.
Macros can work on "any" types provided that they don't mix unsigned
types with signed ones and/or don't overflow.  Overflow is avoided
by not doing the multiplication (num * fsbs) first.  Even this overflow
avoidance is now bogus, since off_t has the same size as the block counts
(all are 64 bits) so overflow can only occur if the args are garbage,
so fsbtoblk() could be the even simpler macro:

#define	fsbtoblk(num, fsbs, bs)	((num) * (fsbs) / (bs))

provided there are no sign extension botches (the block sizes must be
signed if num is signed).  Casting things to signed types to avoid the
sign extension bugs is worse than using signed types throughout, since it
is uglier and strictly needs overflow checks for every conversion (cases
where the extra bit provided by unsigned types is actually needed might
cause overflow), and would require adding casts to interfaces that could
be type-generic modulo signedness, and would generate larger code (see the
commit log to rev.1.62).  OTOH, casting everything to floating point would
work well -- everything would become signed, integral values up to about
2^53 would be represented exactly, and larger values would never cause
overflow and would be represented exactly enough.

Befor