newnfs client and statfs

Rick Macklem rmacklem at uoguelph.ca
Wed May 4 00:27:54 UTC 2011


> On Mon, 2 May 2011, Rick Macklem wrote:
> 
> >>> I'll try and make my Solaris10 box get to -ve frees and then see
> >>> what
> >>> it puts on the wire. After that, I'll start a discussion on
> >>> freebsd-fs@
> >>> about how they think a FreeBSD server should behave when f_bavail
> >>> and/or
> >>> f_ffree are negative.
> >>
> >> The result on Solaris would be interesting. Does Solaris still
> >> support
> >> ffs? You said later that you couldn't get it to generate negative
> >> values.
> >>
> > Well, I just did the reverse (ran a FreeBSD FFS disk out of space so
> > it reported a -ve free and mounted in on Solaris10). Here are the
> > "df" outputs (I used "df -k" on Solaris, since that's a compatible
> > format):
> 
> That is almost as good a test.
> 
> > FreeBSD-current server (nfsv4-newlap):
> > Filesystem 1K-blocks Used Avail Capacity Mounted on
> > /dev/ad4s3a 2026030 671492 1192456 36% /
> > devfs 1 1 0 100% /dev
> > /dev/ad4s3e 4697030 4544054 -222786 105% /sub1
> > /dev/ad4s3d 5077038 641462 4029414 14% /usr
> >
> > Solaris10 client:
> > Filesystem kbytes used avail capacity Mounted on
> > /dev/dsk/c0d0s0 3870110 2790938 1040471 73% /
> > /devices 0 0 0 0% /devices
> > ctfs 0 0 0 0% /system/contract
> > proc 0 0 0 0% /proc
> > mnttab 0 0 0 0% /etc/mnttab
> > swap 975736 624 975112 1% /etc/svc/volatile
> > objfs 0 0 0 0% /system/object
> > /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73%
> > /lib/libc.so.1
> > fd 0 0 0 0% /dev/fd
> > swap 975112 0 975112 0% /tmp
> > swap 975140 28 975112 1% /var/run
> > /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home
> > nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt
> >
> > as you can see, Solaris10 doesn't assume it's negative and
> > reports lottsa avail.
> >
> > I don't have a Linux client handy, so I can't do the same test
> > with Linux, rick
> 
> I looked at linux-2.6.10 code. It doesn't do anything good for signed
> counts, and declares f_bavail with a bad mixture of arch-dependent
> types
> -- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-). It does 1
> nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE =
> 512
> for nfs, the blocksize is a parameter, and in scaling by this it is
> careful to round up.
> 
> NetBSD is best. Its statvfs at least has full support for handling
> this
> problem. From a 2004 version of NetBSD statvfs.h:
> 
> % struct statvfs {
> % unsigned long f_flag; /* copy of mount exported flags */
> % unsigned long f_bsize; /* file system block size */
> % unsigned long f_frsize; /* fundamental file system block size */
> % unsigned long f_iosize; /* optimal file system block size */
> %
> % fsblkcnt_t f_blocks; /* number of blocks in file system, */
> % /* (in units of f_frsize) */
> % fsblkcnt_t f_bfree; /* free blocks avail in file system */
> % fsblkcnt_t f_bavail; /* free blocks avail to non-root */
> % fsblkcnt_t f_bresvd; /* blocks reserved for root */
> 
> statvfs is specified by POSIX, and I previously mentioned that POSIX
> is
> quite broken in this area. One of the bugs is that all the POSIX block
> count types like fsblkcnt_t in the above are specified to be unsigned.
> Thus negative block counts cannot be supported directly using these
> types,
> even if the OS has negative block counts. In the above, NetBSD works
> around this by having an extension giving a nonnegative block count
> for
> the blocks reserved for root. statfs should have used this instead of
> a hack involving negative counts, but presumably didn't to avoid
> changing
> the ABI. Even NetBSD doesn't have this extension for statfs, at least
> in 2004. statfs(2) was apparently deprecated in NetBSD before 2004,
> with
> newer features only going into statvfs(2).
> 
> %
> % fsfilcnt_t f_files; /* total file nodes in file system */
> % fsfilcnt_t f_ffree; /* free file nodes in file system */
> % fsfilcnt_t f_favail; /* free file nodes avail to non-root */
> % fsfilcnt_t f_fresvd; /* file nodes reserved for root */
> 
> Similarly.
> 
> %
> % uint64_t f_syncreads; /* count of sync reads since mount */
> % uint64_t f_syncwrites; /* count of sync writes since mount */
> %
> % uint64_t f_asyncreads; /* count of async reads since mount */
> % uint64_t f_asyncwrites; /* count of async writes since mount */
> %
> % fsid_t f_fsidx; /* NetBSD compatible fsid */
> % unsigned long f_fsid; /* Posix compatible fsid */
> % unsigned long f_namemax; /* maximum filename length */
> % uid_t f_owner; /* user that mounted the file system */
> %
> % uint32_t f_spare[4]; /* spare space */
> %
> % char f_fstypename[_VFS_NAMELEN]; /* fs type name */
> % char f_mntonname[_VFS_MNAMELEN]; /* directory on which mounted */
> % char f_mntfromname[_VFS_MNAMELEN]; /* mounted file system */
> %
> % };
> 
> As I said before, NetBSD's nfs tries to make this work for nfs, but I
> couldn't this worked in NetBSD or anything I could think of, since the
> extension is not in the nfs protocol. Now I think it does work, but
> still can't see how. Details: NetBSD puts f_bavail on the wire without
> clamping it (it just scales it). Now I think f_bavail is never
> negative
> in NetBSD, so this scaling doesn't involves the usual sign extension
> and overflow bugs, or abuse of the top bit. The client zaps negative
> values for v3 f_bavail but not for other things, and initializes
> f_bresvd:
> from a 2005 version ofs nfs_vfsops.c:
> 
> % if (v3) {
> % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE;
> % tquad = fxdr_hyper(&sfp->sf_tbytes);
> % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % tquad = fxdr_hyper(&sfp->sf_fbytes);
> % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % tquad = fxdr_hyper(&sfp->sf_abytes);
> % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % sbp->f_bresvd = sbp->f_bfree - tquad;
> 
> I still can't see how this initialization works. f_bresvd has to end
> up as nonzero if root has a reserve, and drop to zero as the reserve
> is used up. sf_fbytes - sf_abytes must give this reserve.
> 
> % sbp->f_bavail = tquad;
> % #ifdef COMPAT_20
> % /* Handle older NFS servers returning negative values */
> % if ((quad_t)sbp->f_bavail < 0)
> % sbp->f_bavail = 0;
> % #endif
> 
> NetBSD's own server puts f_bavail on the wire unchanged except for
> scaling,
> so it is now clear that f_bavail is never negative in NetBSD.
> 
> % tquad = fxdr_hyper(&sfp->sf_tfiles);
> % sbp->f_files = tquad;
> % tquad = fxdr_hyper(&sfp->sf_ffiles);
> % sbp->f_ffree = tquad;
> % sbp->f_favail = tquad;
> 
> "Negative" values for this are not zapped.
> 
> % sbp->f_fresvd = 0;
> 
> This reserv is not really supported. Supporting it is impossible since
> there is not as much redundancy in the wire values for the file counts
> as for the block counts.
> 
> % sbp->f_namemax = MAXNAMLEN;
> % } else {
> % sbp->f_bsize = NFS_FABLKSIZE;
> % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize);
> % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks);
> % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree);
> % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail);
> 
> Still has old bugs.
> 
> % sbp->f_fresvd = 0;
> % sbp->f_files = 0;
> % sbp->f_ffree = 0;
> % sbp->f_favail = 0;
> % sbp->f_fresvd = 0;
> % sbp->f_namemax = MAXNAMLEN;
> % }
> 
> Next steps: someone should look at why there are 3 nfsv3 protocol
> fields for the block counts when only 2 are strictly needed.
> 
> Bruce
Here is the RFCs definition of the 3 fields:
      tbytes
         The total size, in bytes, of the file system.

      fbytes
         The amount of free space, in bytes, in the file
         system.

      abytes
         The amount of free space, in bytes, available to the
         user identified by the authentication information in
         the RPC.  (This reflects space that is reserved by the
         file system; it does not reflect any quota system
         implemented by the server.)

I suspect that most systems running FFS (mis)use abytes to represent
the non-root value, even when "root" does the RPC. If they didn't
do that, then abytes would be different when root did statfs and that
would be confusing to a typical client.

Since you don't know if the server's file system is one like FFS that
has a "minfree" (and you don't know what "minfree" is), you can't
reliably calculate a negative f_bavail from the above, from what I
can see.

rick


More information about the freebsd-fs mailing list