>0x7fffffff blocksize filesystem reporting

Thu Nov 6 03:44:51 PST 2003

On Wed, 5 Nov 2003, Kirk McKusick wrote:

> I have gone back and resurrected the changes for the updated statfs
> structure that were discussed on arch some months ago. Because they
> ...

> Index: sys/sys/mount.h
> ===================================================================
> RCS file: /usr/ncvs/src/sys/sys/mount.h,v
> retrieving revision 1.148
> diff -c -r1.148 mount.h
> *** sys/sys/mount.h	1 Jul 2003 17:40:23 -0000	1.148
> --- sys/sys/mount.h	5 Nov 2003 05:10:54 -0000
> ***************
> *** 63,75 ****
>   /*
>    * filesystem statistics
>    */
>
> ! #define	MFSNAMELEN	16	/* length of fs type name, including null */
> ! #define	MNAMELEN	(88 - 2 * sizeof(long))	/* size of on/from name bufs */
>
>   /* XXX getfsstat.2 is out of date with write and read counter changes here. */
>   /* XXX statfs.2 is out of date with read counter changes here. */
> ! struct statfs {
>   	long	f_spare2;		/* placeholder */
>   	long	f_bsize;		/* fundamental filesystem block size */
>   	long	f_iosize;		/* optimal transfer block size */
> --- 63,103 ----
>   /*
>    * filesystem statistics
>    */
> + #define	MFSNAMELEN	16		/* length of type name including null */
> + #define	MNAMELEN	80		/* size of on/from name bufs */

As pointed out by tjr, there are buffer overflows from bcopy() MNAMELEN
bytes.  This is because the above MNAMELEN Is smaller than the old MNAMELEN
on systems with sizeof(long) > 4.  It should be either the historical
MNAMELEN of 90 or 88, or larger than that since we have room for expansion,
or no larger than the previous smallest MNAMELEN for all machines, depending
on whether the overflow bug is fixed by truncation on conversion or avoided
by truncation when the name is copied in.

> + #define STATFS_VERSION	0x20030518	/* current version number */

Style bug (space instead of tab).

> + struct statfs {
> + 	u_int32_t f_version;		/* structure version number */
> + 	u_int32_t f_type;		/* type of filesystem */
> + 	u_int64_t f_flags;		/* copy of mount exported flags */
> + 	u_int64_t f_bsize;		/* filesystem fragment size */
> + 	u_int64_t f_iosize;		/* optimal transfer block size */
> + 	u_int64_t f_blocks;		/* total data blocks in filesystem */
> + 	u_int64_t f_bfree;		/* free blocks in filesystem */
> + 	int64_t	  f_bavail;		/* free blocks avail to non-superuser */
> + 	u_int64_t f_files;		/* total file nodes in filesystem */
> + 	int64_t	  f_ffree;		/* free nodes avail to non-superuser */
> + 	u_int64_t f_syncwrites;		/* count of sync writes since mount */
> + 	u_int64_t f_asyncwrites;	/* count of async writes since mount */
> + 	u_int64_t f_syncreads;		/* count of sync reads since mount */
> + 	u_int64_t f_asyncreads;		/* count of async reads since mount */
> + 	u_int64_t f_spare[10];		/* unused spare */
> + 	u_int32_t f_namemax;		/* maximum filename length */

I disklike all these unsigned types, and to a lesser extent, typedefed types
and some of the 64-bit types.

Unsigned types limit possibilties for trapping overflow if overflow
actually occurs.  They should not be used to get an extra bit of
precision unless the extra bit is very important.  The old code got
this right by using only signed types.  f_bavail and f_ffree need to
go negative so they are correctly signed types, but this means that
the extra bit for the 64-bit unsigned block and node counts cannot
actually be used (since values that use it would cause overflow if
most of the blocks or nodes are free).

Typedefed types are difficult to print, as shown by printf format errors
in most of the formats changed in this patch.  But forwards compatibility
at the source level requires them.

Why 64-bit types for f_bsize and f_iosize?

If unsigned fixed-width types are used, then they should be named in
a standard way (uintN_t, not u_intN_t).

There are also some standard POSIX types for block counts, etc.  These
are used in POSIX's variant of statfs().  E.g., there is blkcnt_t, which
is a signed integral type not yet implemented in FreeBSD.  In the XSI
extension there is also fsblkcnt_t, which is an _unsigned_ integral
type implemented as uint64_t in FreeBSD.  Its unsignedness is apparently
derived from the old XSI type of unsigned long for block counts.

> + 	uid_t	  f_owner;		/* user that mounted the filesystem */
> + 	fsid_t	  f_fsid;		/* filesystem id */
> + 	char	  f_charspare[76];	    /* spare string space */
> + 	char	  f_fstypename[MFSNAMELEN]; /* filesystem type name */
> + 	char	  f_mntfromname[MNAMELEN];  /* mounted filesystem */
> + 	char	  f_mntonname[MNAMELEN];    /* directory on which mounted */
> + };

f_charspare would be better at the end.  If explicit padding is needed
after f_fsid_t, then it should be in terms of fsid_t spares.  Similarly
for spares after f_namemax and f_owner.  These are only packed because
uid_t happens to be 32-bit.  fsid_t is 2*32-bit so it doesn't need
padding either.  76 is an odd amount of padding.  It gives 252 bytes
of char arrays altogether and a struct size of 452 on i386's.  452 is
4 mod 8, so there are 4 bytes of unnamed padding at the end of the
struct on most 64-bit arches (if not internal padding).

> Index: sys/kern/vfs_syscalls.c
> ===================================================================
> RCS file: /usr/ncvs/src/sys/kern/vfs_syscalls.c,v
> retrieving revision 1.332
> diff -c -r1.332 vfs_syscalls.c
> *** sys/kern/vfs_syscalls.c	19 Oct 2003 20:41:07 -0000	1.332
> --- sys/kern/vfs_syscalls.c	5 Nov 2003 05:10:54 -0000
> ...
> + /*
> +  * Convert a new format statfs structure to an old format statfs structure.
> +  */
> + static void
> + cvtstatfs(td, nsp, osp)
> + 	struct thread *td;
> + 	struct statfs *nsp;
> + 	struct ostatfs *osp;
> + {
> +
> + 	bzero(osp, sizeof(*osp));
> + 	osp->f_bsize = nsp->f_bsize;
> + 	osp->f_iosize = nsp->f_iosize;
> + 	osp->f_blocks = nsp->f_blocks;
> + 	osp->f_bfree = nsp->f_bfree;
> + 	osp->f_bavail = nsp->f_bavail;
> + 	osp->f_files = nsp->f_files;
> + 	osp->f_ffree = nsp->f_ffree;

tjr suggested setting the values to LONG_MAX instead of blindly truncating
them  I think POSIX has some methods for dealing with such overflows
(from LFS).  I thing they would reduce to returning -1/EOVERFLOW here,
which is not very useful.

tjr also suggested using a fake block size.  This is already done in nfs,
but the implementation is broken (nfs3 has similar 32-bit limits in the
client-server interface).

> Index: sys/kern/vfs_bio.c
> ===================================================================
> RCS file: /usr/ncvs/src/sys/kern/vfs_bio.c,v
> retrieving revision 1.420
> diff -c -r1.420 vfs_bio.c
> *** sys/kern/vfs_bio.c	4 Nov 2003 06:30:00 -0000	1.420
> --- sys/kern/vfs_bio.c	5 Nov 2003 05:10:54 -0000
> ***************
> *** 3239,3245 ****
>   				    (int) m->pindex, (int)(foff >> 32),
>   						(int) foff & 0xffffffff, resid, i);
>   				if (!vn_isdisk(vp, NULL))
> ! 					printf(" iosize: %ld, lblkno: %jd, flags: 0x%x, npages: %d\n",
>   					    bp->b_vp->v_mount->mnt_stat.f_iosize,
>   					    (intmax_t) bp->b_lblkno,
>   					    bp->b_flags, bp->b_npages);
> --- 3239,3245 ----
>   				    (int) m->pindex, (int)(foff >> 32),
>   						(int) foff & 0xffffffff, resid, i);
>   				if (!vn_isdisk(vp, NULL))
> ! 					printf(" iosize: %jd, lblkno: %jd, flags: 0x%x, npages: %d\n",
>   					    bp->b_vp->v_mount->mnt_stat.f_iosize,
>   					    (intmax_t) bp->b_lblkno,
>   					    bp->b_flags, bp->b_npages);

Example of a printf format error.  The long was easy to print using %ld,
but now there is a a u_int64_t.  Using %jd gives a sign mismatch on all
machines and a size mismatch on machines with
sizeof(u_int64_t) != sizeof(intmax_t).

> Index: bin/df/df.c
> ===================================================================
> RCS file: /usr/ncvs/src/bin/df/df.c,v
> retrieving revision 1.51
> diff -c -r1.51 df.c
> *** bin/df/df.c	13 Sep 2003 20:46:58 -0000	1.51
> --- bin/df/df.c	5 Nov 2003 19:22:11 -0000
> ...

This has many more examples of printf format errors.  The old code has
many related bugs.

Bruce