Fix MNAMELEN or reimplement struct statfs

Tue Nov 19 17:42:24 UTC 2013

I thought you were implying more then the one example that you gave.

| > | One obvious and important thing which is broken with the patch is
| > | the unmounts from jails. In other words, now it is possible to mount
| > | something from jail with appropriate privileges set up, and after that
| > | corresponding mount cannot be unmounted, since vfs_mount_alloc() copies
| > | trimmed path into f_mntonname, and sys_unmount() matches full path with
| > | pathbuf.  Hmm, this should be broken in the same way for non-jailed
| > | mounts with pathes which do not fit into f_mntonname.
| > 
| > They can be umounted since it will fall back to fsid as in the non-jail
| > case.  I just tried it sorry for the bad line wrap:
| > 
| > + mount 192.168.38.1:/data/home/ambrisko/netboot /data/jail/test
| > + jail -i -c name=test path=/data/jail/test host.hostname=test.ambrisko.com persist enforce_statfs=0 allow.mount=1 allow.mount.devfs=1 allow.mount.nullfs=1 allow.mount.tmpfs=1 allow.mount.procfs=1
| > 14
| > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test mkdir -p /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > + jexec test mount -t procfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test mount -t devfs null //1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > procfs                                            8         8          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > + jexec test mount -v
| > + egrep '^devfs|^procfs'
| > devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000)
| > procfs on /proc (procfs, local, fsid 02ff000202000000)
| > procfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (procfs, local, fsid 26ff000202000000)
| > devfs on /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901 (devfs, local, multilabel, fsid 27ff007171000000)
| > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/proc
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| > devfs                                             2         2          0   100%    /data/jail/test/12345678901234567890123456789012345678901234567890123456789012345678901
| > + jexec test umount /1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/dev
| > + jexec test df
| > + egrep '^devfs|^procfs'
| > devfs                                             2         2          0   100%    /dev
| > procfs                                            8         8          0   100%    /proc
| 
| I.e. unmount gets EINVAL, right ? I do not like it, if going this route,
| why do we need to store the path in the kernel at all ?

For compatibility with old stuff that hasn't switch to fsid.  I'll
describe it below since it looks like umount(8) doesn't use it any more
unless fsid fails.

| At least, the
| attempt to unmount by path should consistently return EINVAL always,
| instead of failing randomly due to an implementation detail, where the
| caller can reasonably expect the syscall to succeed.

Yes, a failed match by is EINVAL, a failed match by fsid it ENOENT.

First I'm talking about the umount binary and I made a mistake describing
its behaviour.  I thought it was trying the path first when it actually
tried the fsid first returned from the stat structure.  If that fails
then it tries the path for older kernels:
        /* First try to unmount using the file system ID. */
        snprintf(fsidbuf, sizeof(fsidbuf), "FSID:%d:%d", sfs->f_fsid.val[0],
            sfs->f_fsid.val[1]);
        if (unmount(fsidbuf, fflag | MNT_BYFSID) != 0) {
                /* XXX, non-root users get a zero fsid, so don't warn. */
                if (errno != ENOENT || sfs->f_fsid.val[0] != 0 ||
                    sfs->f_fsid.val[1] != 0)
                        warn("unmount of %s failed", sfs->f_mntonname);
                if (errno != ENOENT) {
                        free(orignfsdirname);
                        return (1);
                }
                /* Compatibility for old kernels. */
                if (sfs->f_fsid.val[0] != 0 || sfs->f_fsid.val[1] != 0)
                        warnx("retrying using path instead of file system ID");
                if (unmount(sfs->f_mntonname, fflag) != 0) {
                        warn("unmount of %s failed", sfs->f_mntonname);
                        free(orignfsdirname);
                        return (1);
                }
        }

This was introduced at 1.38 in umount.c before 5.2 got released:
  When mount(8) is invoked with the `-v' flag, display the filesystem
  ID for each file system in addition to the normal information.

  In umount(8), accept filesystem IDs as well as the usual device and
  path names. This makes it possible to unambiguously specify which
  file system is to be unmounted even when two or more file systems
  share the same device and mountpoint names (e.g. NFS mounts from
  the same export into different chroots).
and refined in 1.39.

This doesn't address your concern about the system call unmount.

| > | I think that struct mount should have a const char * field where the
| > | non-trimmed path is stored and used for match at unmount. f_mntonname
| > | truncation would be only unfortunate user interface glitch.
| > 
| > Note that we are not storing the path in mount structure so no structures
| > have changed which is nice since then we haven't introduced any real
| > ABI breakage.  So we could MFC this.  The match isn't critical since
| > umount will fall back to fsid and work.  One thing that might be good to
| > do is change umount to try to umount via fsid first and then do the
| > match if the fsid failed versus the other way round that it does now.
| I do not like somtimes not storing the full path of the mount point.
| I do understand that the path can easily made invalid, but I still want
| it there.
| 
| MFC is not the problem for struct mount, which is never directly allocated
| by non-VFS.  The new member must be added to the end of the structure, which
| preserves KBI. I did such surgery more than once.

I was talking about the more general case since the system tries to keep
the path in the stat structure.  My prior approach which had more issues
was to modify the stat structure of which I was pointed to NetBSD and their
change to statvfs which doesn't really solve the problem.  They don't
have the check to see if the mount is longer then VFS_MNAMELEN (in their case)
and just truncate things.

If we are just talking about adding it to the mount structure that
would be okay since it isn't exposed to user land.  I can add that.

| > The problem I see is if someone tries to do things based on the parsed
| > output of mount/df then that will fail since the output is truncated.
| Yes, this is understandable and IMO acceptable.

I think we might be able to fix that in the future by populating our
bogus statvfs with more valid values for paths.  With your suggestion
we could populate the statvfs on the fly with a value from the mount
structure.  Then convert df and mount to use statvfs.

Thanks,

Doug A.