newnfs client and statfs

Tue May 3 08:30:56 UTC 2011

On Mon, 2 May 2011, Rick Macklem wrote:

> I have attached a version of the patch that I intend to commit
> unless it doesn't work for Kostik's test case. Kostik, could
> you please test this one.
>
> Yes, Bruce, I realize you won't like it, but I
> have put some comments in it
> to try and clarify why it is coded the way it is.
> (The arithmetic seems to work the way I would expect it to for
> i386, which is the only arch I have for testing.)

Sigh.

% --- fs/nfsclient/nfs_clport.c.sav	2011-04-30 20:16:39.000000000 -0400
% +++ fs/nfsclient/nfs_clport.c	2011-05-02 19:32:31.000000000 -0400
% @@ -838,21 +838,33 @@ void
%  nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs)
%  {
%  	struct statfs *sbp = (struct statfs *)statfs;
% -	nfsquad_t tquad;
% 
%  	if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
%  		sbp->f_bsize = NFS_FABLKSIZE;
% -		tquad.qval = sfp->sf_tbytes;
% -		sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_fbytes;
% -		sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_abytes;
% -		sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_tfiles;
% -		sbp->f_files = (tquad.lval[0] & 0x7fffffff);
% -		tquad.qval = sfp->sf_ffiles;
% -		sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);
% +		sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
% +		sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
% +		/*
% +		 * Although sf_abytes is uint64_t and f_bavail is int64_t,
% +		 * the value after dividing by NFS_FABLKSIZE is small
% +		 * enough that it will fit in 63bits, so it is ok to
% +		 * assign it to f_bavail without fear that it will become
% +		 * negative.
% +		 */
% +		sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
% +		sbp->f_files = sfp->sf_tfiles;
% +		/* Since f_ffree is int64_t, clip it to 63bits. */
% +		if (sfp->sf_ffiles > (uint64_t)INT64_MAX)

This cast has no effect.  INT64_MAX has type int64_t.  sf_ffiles has
uint64_t.  The default binary promotions cause both types to be promoted
to the minimally larger common type.  This type is uint64_t.  Thus
INT64_MAX is converted automatically to the correct type.

% +			sbp->f_ffree = INT64_MAX;
% +		else
% +			sbp->f_ffree = sfp->sf_ffiles;
%  	} else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
% +		/*
% +		 * The type casts to (int32_t) ensure that this code is
% +		 * compatible with the old NFS client, in that it will
% +		 * sign extend a value with bit31 set. This may or may
% +		 * not be correct for NFSv2, but since it is a legacy
% +		 * environment, I'd rather retain backwards compatibility.
% +		 */
%  		sbp->f_bsize = (int32_t)sfp->sf_bsize;
%  		sbp->f_blocks = (int32_t)sfp->sf_blocks;
%  		sbp->f_bfree = (int32_t)sfp->sf_bfree;

It won't sign extend, but will propagate bit31 as an unsigned bit.  For
example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks =
0xFFFFFFFF80000000, which is massively different.  Again, omitting the
cast gives the correct result if the wire insists on its values being
unsigned.

The result is only backwards compatible with relatively recent FreeBSD
nfs clients.  All FreeBSD clients are completely broken if bit31 is
set, and compatibility with this brokenness is not useful (but as I
pointed out in another reply, we would never have seen the broken case
when the old clients weren't old, since it takes a server file system
size of about 32TB for bit 31 to be set).  The details of the brokenness
vary:

Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]:
    f_blocks was plain long:
       if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes
         sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN)
       if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes
         sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN)

FreeBSD-current after 2003/11/12, FreeBSD-[5-9]:
    f_blocks is now uint64_t:
      changing it (and others from a signed type to an unsigned type mainly
      gave lots of sign extension bugs, including here.  The bugs remain
      mostly unfixed.
        sfp->sf_blocks = 0x80000000 becomes
          sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all arches.

Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives useful
behaviour.  The former is negative, though the wire value cannot be negative
(not sure about this for v2).  Applications that are naive enough to believe
this value should assume that the the file system has a negative size and
never try to write anything.  The latter is enormous and positive.  If the
wire count really is 0x80000000, then that is already very large, so
believing that the value is 0xFFFFFFFF80000000 should make little difference.

The bugs are a little different for signed fields like f_bavail.  Now there
are no sign extension bugs or version-dependent misbehaviours.  There are
just overflow bugs in the bogus casts.  (int32_t)0x80000000 overflows to
INT32_MIN (only on 2's complement machines, but no others are supported),
and assignment to sbp->f_bavail doesn't change this garbage value.  Now
the bugs are even further off, since it takes about a 400 TB ffs server file
system to reach them.  (400 TB with 8% minfree gives a 32TB reserve for
root.  After using all 32TB of this reserve, there would be -32TB available
for non-root.  -32TB is INT32_MIN in 16K-blocks.)

Bruce