NFS & ZFS: how to export whole FS hierarhy to mount it with one command on client?

Rick Macklem rmacklem at uoguelph.ca
Mon Aug 3 12:25:18 UTC 2015


Julian Elischer wrote:
> On 8/1/15 7:30 PM, Lev Serebryakov wrote:
> > Hello Rick,
> >
> > Saturday, August 1, 2015, 2:21:10 PM, you wrote:
> >
> >> To mount multiple file systems as one mount, you'll need to use NFSv4. I
> >> believe
> >> you will have to have a separate export entry in the server for each of
> >> the file
> >> systems.
> >   So, /etc/exports needs to have BOTH v3-style exports & V4: root of tree
> >   line?
> 
> OR you can have a non-standard patch that pjd wrote to do recursive
> mounts of sub-filesystems.
> it is not supposed to happen according to the standard but we have
> found it useful.
> Unfortnately it is written agains the old NFS Server.
> 
> Rick, if I gave you the original pjd patch for the old server, could
> you integrate it into the new server as an option?
> 
A patch like this basically inserts the file system volume identifier
in the high order bits of the fileid# (inode# if you prefer), so that
duplicate fileid#s don't show up in a "consolidated file system" (for
want of a better term). It also replies with the same "fake" fsid for
all volumes involved.

I see certain issues w.r.t. this:
1 - What happens when the exported volumes are disjoint and don't form
    one tree? (I think any just option should be restricted to volumes
    that form a tree, but I don't know an easy way to enforce that restriction?)
2 - It would be fine at this point to use the high order bits of the fileid#,
    since NFSv3 defines it as 64bits and FreeBSD's ino_t is 32bits. However,
    I believe FreeBSD is going to have to increase ino_t to 64bits soon.
    (I hope such a patch will be in FreeBSD11.)
    Once ino_t is 64bits, this option would have to assume that some # of
    the high order bits of the fileid# are always 0. Something like
    "the high order 24bits are always 0" would work ok for a while, then
    someone would build a file system large enough to overflow the 40bit
    (I know that's a lot, but some are already exceeding 32bits for # of
     fileids) field and cause trouble.
3 - You could get weird behaviour when the tree includes exports with different
    export options. This discussion includes just that and NFSv3 clients
    don't expect things to change within a mount. (An example would be having
    part of this consolidated tree require Kerberos authentication. Another
    might be having parts of the consolidated tree use different uid mapping
    for AUTH_SYS.)
4 - Some file systems (msdosfs ie. FAT) have limited capabilities w.r.t. what
    the NFS server can do to the file system. If one of these was imbedded in
    the consolidated tree, then it could cause confusion similar to #3.

All in all, the "hack" is relatively easy to do, if:
You use one kind of file system (for example ZFS) and make everything you are
exporting one directory tree which is all exported in a compatible way.
You also "know" that all the fileid#s in the underlying file systems will fit
in the low order K bits of the 64bit fileid#.

My biggest concern is #2, once ino_t becomes 64bits.

If the collective thinks this is a good idea despite the issues above and can
propose a good way to do it. (Maybe an export flag for all the volumes that
will participate in the "consolidated file system"? The exports(5) man page
could then try to clearly explain the limitations of its use, etc. Even with
that, I suspect some would misuse the option and cause themselves grief.)

Personally, since NFSv4 does this correctly, I don't see a need to "hack it"
for NFSv3, but I'll leave it up to the collective.

rick
ps: Julian, you might want to repost this under a separate subject line, so
    people not interested in how ZFS can export multiple volumes without
    separate entries will read it.

> >
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
> 


More information about the freebsd-fs mailing list