Fix MNAMELEN or reimplement struct statfs

Fri Nov 15 01:08:56 UTC 2013

On Thu, Nov 14, 2013 at 09:32:17PM +0000, Jase Thew wrote:
| On 10/06/2013 16:52, John Baldwin wrote:
| > On Saturday, June 08, 2013 9:36:27 pm mdf at freebsd.org wrote:
| >> On Sat, Jun 8, 2013 at 3:52 PM, Dirk Engling <erdgeist at erdgeist.org> wrote:
| >>
| >>> The arbitrary value
| >>>
| >>> #define MNAMELEN        88              /* size of on/from name bufs */
| >>>
| >>> struct statfs {
| >>> [...]
| >>>         char    f_mntfromname[MNAMELEN];/* mounted filesystem */
| >>>         char    f_mntonname[MNAMELEN];  /* directory on which mounted */
| >>> };
| >>>
| >>> currently bites us when trying to use poudriere with errors like
| >>>
| >>> 'mount: tmpfs: File name too long'
| >>>
| >>>
| >>> /poudriere/data/build/91_RELEASE_amd64-REALLY-REALLY-LONG-
| > JAILNAME/ref/wrkdirs
| >>>
| >>> The topic has been discussed several times since 2004 and has been
| >>> postponed each time, the last time when it hit zfs users:
| >>>
| >>> http://lists.freebsd.org/pipermail/freebsd-fs/2010-March/007974.html
| >>>
| >>> So I'd like to point to the calendar, it's 2013 already and there's
| >>> still a static arbitrary (and way too low) limit in one of the core
| >>> areas of the vfs code.
| >>>
| >>> So I'd like to bump the issue and propose either making f_mntfromname a
| >>> dynamic allocation or just increase MNAMELEN, using 10.0 as water shed.
| >>>
| >>
| >> Gleb Kurtsou did this along with the ino64 GSoC project.  Unfortunately,
| >>  both he and I hit ENOTIME due to the job that pays the bills and it's
| >> never made it back to the main repository.
| >>
| >> IIRC, though, the only reason for doing it with 64-bit ino_t is that he'd
| >> already finished changing the stat/dirent ABI so what was one more.  I
| >> think he went with 1024 bytes, which also necessitated not allocating
| >> statfs on the stack for the kernel.
| > 
| > He also fixed a few other things since changing this ABI is so invasive
| > IIRC.  This really is the right fix for this.  Is it in an svn branch 
| > that can be updated and a new patch generated?
| > 
| 
| Hi folks,
| 
| Has there been any progress on addressing this issue? With the advent of
| pkgng / poudriere, this limitation is really becoming a frustrating problem.

I looked at NetBSD and what they did with statvfs.  The mount paths
lengths are bigger in NetBSD defines so that helps.  However, when
testing it out via a script that keep on doing a nullfs mount in 
every increasing directory depth I found that NetBSD would allow the
mount to exceed the value in statvfs.  When NetBSD populates the path
in statvfs they truncate it to what fits in statvfs.  So I looked at
what that might be like in FreeBSD.  So I came up with this simple patch:

--- /sys/kern/vfs_mount.c	2013-10-01 14:27:35.000000000 -0700
+++ vfs_mount.c	2013-10-21 14:20:19.000000000 -0700
@@ -656,7 +656,7 @@ vfs_donmount(struct thread *td, uint64_t
 	 * variables will fit in our mp buffers, including the
 	 * terminating NUL.
 	 */
-	if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MNAMELEN - 1) {
+	if (fstypelen >= MFSNAMELEN - 1 || fspathlen >= MAXPATHLEN - 1) {
 		error = ENAMETOOLONG;
 		goto bail;
 	}
@@ -748,8 +748,8 @@ sys_mount(td, uap)
 		return (EOPNOTSUPP);
 	}
 
-	ma = mount_argsu(ma, "fstype", uap->type, MNAMELEN);
-	ma = mount_argsu(ma, "fspath", uap->path, MNAMELEN);
+	ma = mount_argsu(ma, "fstype", uap->type, MFSNAMELEN);
+	ma = mount_argsu(ma, "fspath", uap->path, MAXPATHLEN);
 	ma = mount_argb(ma, flags & MNT_RDONLY, "noro");
 	ma = mount_argb(ma, !(flags & MNT_NOSUID), "nosuid");
 	ma = mount_argb(ma, !(flags & MNT_NOEXEC), "noexec");
@@ -1039,7 +1039,7 @@ vfs_domount(
 	 * variables will fit in our mp buffers, including the
 	 * terminating NUL.
 	 */
-	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MNAMELEN)
+	if (strlen(fstype) >= MFSNAMELEN || strlen(fspath) >= MAXPATHLEN)
 		return (ENAMETOOLONG);
 
 	if (jailed(td->td_ucred) || usermount == 0) {
@@ -1095,9 +1095,9 @@ vfs_domount(
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	vp = nd.ni_vp;
 	if ((fsflags & MNT_UPDATE) == 0) {
-		pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
+		pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
 		strcpy(pathbuf, fspath);
-		error = vn_path_to_global_path(td, vp, pathbuf, MNAMELEN);
+		error = vn_path_to_global_path(td, vp, pathbuf, MAXPATHLEN);
 		/* debug.disablefullpath == 1 results in ENODEV */
 		if (error == 0 || error == ENODEV) {
 			error = vfs_domount_first(td, vfsp, pathbuf, vp,
@@ -1147,8 +1147,8 @@ sys_unmount(td, uap)
 			return (error);
 	}
 
-	pathbuf = malloc(MNAMELEN, M_TEMP, M_WAITOK);
-	error = copyinstr(uap->path, pathbuf, MNAMELEN, NULL);
+	pathbuf = malloc(MAXPATHLEN, M_TEMP, M_WAITOK);
+	error = copyinstr(uap->path, pathbuf, MAXPATHLEN, NULL);
 	if (error) {
 		free(pathbuf, M_TEMP);
 		return (error);
@@ -1181,7 +1181,7 @@ sys_unmount(td, uap)
 			vfslocked = NDHASGIANT(&nd);
 			NDFREE(&nd, NDF_ONLY_PNBUF);
 			error = vn_path_to_global_path(td, nd.ni_vp, pathbuf,
-			    MNAMELEN);
+			    MAXPATHLEN);
 			if (error == 0 || error == ENODEV)
 				vput(nd.ni_vp);
 			VFS_UNLOCK_GIANT(vfslocked);

I seemed to have found a typo bug in an instance in which MFSNAMELEN
wasn't used in the fstype when I did this change.

With this patch things in general seem to work.  You can do a
mount and umount of a long path.  The umount of the long path works
by failing on the exact match but then passing when via the FSID.
df/mount looks a little strange since it shows a truncated path 
but has valid contents (FS type, space etc.).  umount via the truncated
path works if there is only one truncated path that matches.  If there
are multiple then it fails.

This doesn't change and kernel ABI's so then it is safe to apply to the
kernel without rebuilding user-land.

Future work could be to implement statvfs to return a longer path but
only do it for df/umount etc.  The rest of the system could continue
with the existing statfs.  mount works because it passed a string into
the kernel so it can be long.

I'd propose this as a current solution to this problem.  It appears to
work well and shouldn't drastically break things.  Doing df via the
full path, stat etc. work since the associated path access the vnode.
So things that do a mount, df of the mount point etc. should continue
to work.  Scripts that try to figure out the mount points vi df and mount
displaying all mount points would fail.  That is probably good enough for
now.

Comments welcomed.

Thanks,

Doug A.