newnfs client and statfs

Bruce Evans brde at optusnet.com.au
Tue May 3 09:18:10 UTC 2011


On Mon, 2 May 2011, Rick Macklem wrote:

>>> I'll try and make my Solaris10 box get to -ve frees and then see
>>> what
>>> it puts on the wire. After that, I'll start a discussion on
>>> freebsd-fs@
>>> about how they think a FreeBSD server should behave when f_bavail
>>> and/or
>>> f_ffree are negative.
>>
>> The result on Solaris would be interesting. Does Solaris still support
>> ffs? You said later that you couldn't get it to generate negative
>> values.
>>
> Well, I just did the reverse (ran a FreeBSD FFS disk out of space so
> it reported a -ve free and mounted in on Solaris10). Here are the
> "df" outputs (I used "df -k" on Solaris, since that's a compatible format):

That is almost as good a test.

> FreeBSD-current server (nfsv4-newlap):
> Filesystem  1K-blocks    Used   Avail Capacity  Mounted on
> /dev/ad4s3a   2026030  671492 1192456    36%    /
> devfs               1       1       0   100%    /dev
> /dev/ad4s3e   4697030 4544054 -222786   105%    /sub1
> /dev/ad4s3d   5077038  641462 4029414    14%    /usr
>
> Solaris10 client:
> Filesystem            kbytes    used   avail capacity  Mounted on
> /dev/dsk/c0d0s0      3870110 2790938 1040471    73%    /
> /devices                   0       0       0     0%    /devices
> ctfs                       0       0       0     0%    /system/contract
> proc                       0       0       0     0%    /proc
> mnttab                     0       0       0     0%    /etc/mnttab
> swap                  975736     624  975112     1%    /etc/svc/volatile
> objfs                      0       0       0     0%    /system/object
> /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471    73%    /lib/libc.so.1
> fd                         0       0       0     0%    /dev/fd
> swap                  975112       0  975112     0%    /tmp
> swap                  975140      28  975112     1%    /var/run
> /dev/dsk/c0d0s7      5608190 4118091 1434018    75%    /export/home
> nfsv4-newlap:/sub1   4697030 4544054 18014398509259198     1%    /mnt
>
> as you can see, Solaris10 doesn't assume it's negative and
> reports lottsa avail.
>
> I don't have a Linux client handy, so I can't do the same test
> with Linux, rick

I looked at linux-2.6.10 code.  It doesn't do anything good for signed
counts, and declares f_bavail with a bad mixture of arch-dependent types
-- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-).  It does 1
nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE = 512
for nfs, the blocksize is a parameter, and in scaling by this it is
careful to round up.

NetBSD is best.  Its statvfs at least has full support for handling this
problem.  From a 2004 version of NetBSD statvfs.h:

% struct statvfs {
% 	unsigned long	f_flag;		/* copy of mount exported flags */
% 	unsigned long	f_bsize;	/* file system block size */
% 	unsigned long	f_frsize;	/* fundamental file system block size */
% 	unsigned long	f_iosize;	/* optimal file system block size */
% 
% 	fsblkcnt_t	f_blocks;	/* number of blocks in file system, */
% 					/*   (in units of f_frsize) */
% 	fsblkcnt_t	f_bfree;	/* free blocks avail in file system */
% 	fsblkcnt_t	f_bavail;	/* free blocks avail to non-root */
% 	fsblkcnt_t	f_bresvd;	/* blocks reserved for root */

statvfs is specified by POSIX, and I previously mentioned that POSIX is
quite broken in this area.  One of the bugs is that all the POSIX block
count types like fsblkcnt_t in the above are specified to be unsigned.
Thus negative block counts cannot be supported directly using these types,
even if the OS has negative block counts.  In the above, NetBSD works
around this by having an extension giving a nonnegative block count for
the blocks reserved for root.  statfs should have used this instead of
a hack involving negative counts, but presumably didn't to avoid changing
the ABI.  Even NetBSD doesn't have this extension for statfs, at least
in 2004.  statfs(2) was apparently deprecated in NetBSD before 2004, with
newer features only going into statvfs(2).

% 
% 	fsfilcnt_t	f_files;	/* total file nodes in file system */
% 	fsfilcnt_t	f_ffree;	/* free file nodes in file system */
% 	fsfilcnt_t	f_favail;	/* free file nodes avail to non-root */
% 	fsfilcnt_t	f_fresvd;	/* file nodes reserved for root */

Similarly.

% 
% 	uint64_t  	f_syncreads;	/* count of sync reads since mount */
% 	uint64_t  	f_syncwrites;	/* count of sync writes since mount */
% 
% 	uint64_t  	f_asyncreads;	/* count of async reads since mount */
% 	uint64_t  	f_asyncwrites;	/* count of async writes since mount */
% 
% 	fsid_t		f_fsidx;	/* NetBSD compatible fsid */
% 	unsigned long	f_fsid;		/* Posix compatible fsid */
% 	unsigned long	f_namemax;	/* maximum filename length */
% 	uid_t		f_owner;	/* user that mounted the file system */
% 
% 	uint32_t	f_spare[4];	/* spare space */
% 
% 	char	f_fstypename[_VFS_NAMELEN]; /* fs type name */
% 	char	f_mntonname[_VFS_MNAMELEN];  /* directory on which mounted */
% 	char	f_mntfromname[_VFS_MNAMELEN];  /* mounted file system */
% 
% };

As I said before, NetBSD's nfs tries to make this work for nfs, but I
couldn't this worked in NetBSD or anything I could think of, since the
extension is not in the nfs protocol.  Now I think it does work, but
still can't see how.  Details: NetBSD puts f_bavail on the wire without
clamping it (it just scales it).  Now I think f_bavail is never negative
in NetBSD, so this scaling doesn't involves the usual sign extension
and overflow bugs, or abuse of the top bit.  The client zaps negative
values for v3 f_bavail but not for other things, and initializes f_bresvd:
from a 2005 version ofs nfs_vfsops.c:

% 	if (v3) {
% 		sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE;
% 		tquad = fxdr_hyper(&sfp->sf_tbytes);
% 		sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		tquad = fxdr_hyper(&sfp->sf_fbytes);
% 		sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		tquad = fxdr_hyper(&sfp->sf_abytes);
% 		tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		sbp->f_bresvd = sbp->f_bfree - tquad;

I still can't see how this initialization works.  f_bresvd has to end
up as nonzero if root has a reserve, and drop to zero as the reserve
is used up.  sf_fbytes - sf_abytes must give this reserve.

% 		sbp->f_bavail = tquad;
% #ifdef COMPAT_20
% 		/* Handle older NFS servers returning negative values */
% 		if ((quad_t)sbp->f_bavail < 0)
% 			sbp->f_bavail = 0;
% #endif

NetBSD's own server puts f_bavail on the wire unchanged except for scaling,
so it is now clear that f_bavail is never negative in NetBSD.

% 		tquad = fxdr_hyper(&sfp->sf_tfiles);
% 		sbp->f_files = tquad;
% 		tquad = fxdr_hyper(&sfp->sf_ffiles);
% 		sbp->f_ffree = tquad;
% 		sbp->f_favail = tquad;

"Negative" values for this are not zapped.

% 		sbp->f_fresvd = 0;

This reserv is not really supported.  Supporting it is impossible since
there is not as much redundancy in the wire values for the file counts
as for the block counts.

% 		sbp->f_namemax = MAXNAMLEN;
% 	} else {
% 		sbp->f_bsize = NFS_FABLKSIZE;
% 		sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize);
% 		sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks);
% 		sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree);
% 		sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail);

Still has old bugs.

% 		sbp->f_fresvd = 0;
% 		sbp->f_files = 0;
% 		sbp->f_ffree = 0;
% 		sbp->f_favail = 0;
% 		sbp->f_fresvd = 0;
% 		sbp->f_namemax = MAXNAMLEN;
% 	}

Next steps: someone should look at why there are 3 nfsv3 protocol
fields for the block counts when only 2 are strictly needed.

Bruce


More information about the freebsd-fs mailing list