mount ZFS snapshot on Linux system

Rick Macklem rmacklem at uoguelph.ca
Thu Dec 19 23:49:06 UTC 2013


Jason Keltz wrote:
> On 10/12/2013 7:21 PM, Rick Macklem wrote:
> > Jason Keltz wrote:
> >> I'm running FreeBSD 9.2 with various ZFS datasets.
> >> I export a dataset to a Linux system (RHEL64), and mount it.  It
> >> works
> >> fine...
> >> When I try to access the ZFS snapshot directory on the Linux NFS
> >> client,
> >> things go weird.
> >>
For NFSv4, I found two problems w.r.t. handling ZFS snapshots.
1 - Since a NFSv4 Readdir skips "." and "..", the check for
    VFS_VGET() returning ENOTSUPP wasn't happening, so it wasn't
    switching to use VOP_LOOKUP(). { As I understand it, that
    means that VFS_VGET() gets bogus stuff if the auto pseudo-mount
    of the snapshot hasn't happened. }
2 - Since the pseudo-mount of a snapshot doesn't set v_mountedhere
    in the mounted on vnode, I needed to add a check for a different
    vp->v_mount to recognize the "mount point" crossing.
I think this patch fixes both problems:
  http://people.freebsd.org/~rmacklem/nfsv4-zfs-snapshot.patch
Thanks goes to Jason for helping with testing this.

W.r.t. NFSv3, access to the snapshots is somewhat bogus, in that
an NFSv3 is never supposed to cross mount point boundaries. However,
I don't know how an auto pseudo-mount could be safely exported and
mounted as a separate volume, so all I can think of doing is documenting
"in man nfsd(8)?" that it doesn't quite work. The breakage will
depend on how the NFSv3 client handles st_dev. { The FreeBSD client
sets st_dev to the client NFS mount's fsid and doesn't use the fsid
returned by the server, so it doesn't change. As such, for FreeBSD,
it will see one file system, but with duplicated filenos. For example,
fts(3) might complain about a loop in the directory structure. }

I hope to commit the above patch to head soon, once I get it
reviewed and tested, rick

> >> With NFSv4:
> >>
> >> [jas at archive /]# cd /mnt/.zfs/snapshot
> >> [jas at archive snapshot]# ls
> >> 20131203  20131205  20131206  20131207  20131208  20131209
> >>  20131210
> >> [jas at archive snapshot]# cd 20131210
> >> 20131210: Not a directory.
> >>
> >> huh?
> >>
> >> [jas at archive snapshot]# ls -al
> >> total 77
> >> dr-xr-xr-x   9 root root   9 Dec 10 11:20 .
> >> dr-xr-xr-x   4 root root   4 Nov 28 15:42 ..
> >> drwxr-xr-x 380 root root 380 Dec  2 15:56 20131203
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131205
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131206
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131207
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131208
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131209
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131210
> >> [jas at archive snapshot]# stat *
> >> [jas at archive snapshot]# ls -al
> >> total 292
> >> dr-xr-xr-x 9 root      root         9 Dec 10 11:20 .
> >> dr-xr-xr-x 4 root      root         4 Nov 28 15:42 ..
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131203
> >> -rw-r--r-- 1 uax    guest         865 Jul 31  2009 20131205
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131206
> >> -rw-r--r-- 1 uax    guest         771 Jul 31  2009 20131207
> >> -rw-r--r-- 1 uax    guest         778 Jul 31  2009 20131208
> >> -rw-r--r-- 1 uax     guest       5281 Jul 31  2009 20131209
> >> -rw------- 1 btx      faculty      893 Jul 13 20:21 20131210
> >>
> >> But it gets even more fun..
> >>
> >> # ls -ali
> >> total 205
> >>     2 dr-xr-xr-x   9 root      root       9 Dec 10 11:20 .
> >>     1 dr-xr-xr-x   4 root      root       4 Nov 28 15:42 ..
> >> 863 -rw-r--r--   1 uax     guest 137647 Mar 17  2010 20131203
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131205
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131206
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131207
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131208
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131209
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131210
> >>
> >> This is not a user id mapping issue because all the files in /mnt
> >> have
> >> the proper owner/groups, and I can access them there fine.
> >>
> >> I also tried explicitly exporting .zfs/snapshot.  The result isn't
> >> any
> >> different.
> >>
> >> If I use nfs v3 it "works", but I'm seeing a whole lot of errors
> >> like
> >> these in syslog:
> >>
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131203: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131209: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131210: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131207: Invalid argument
> >>
> >> It's not clear to me why this doesn't just "work".
> >>
> >> Can anyone provide any advice on debugging this?
> >>
> > As I think you already know, I know nothing about ZFS and never
> > use it.
> Yup! :)
> > Having said that, I suspect that there are filenos (i-node #s)
> > that are the same in the snapshot as in the parent file system
> > tree.
> >
> > The basic assumptions are:
> > - within a file system, all i-node# are unique (represent one file
> >    object only) and all file objects have the same fsid
> > - when the fsid changes, that indicates a file system boundary and
> >    fileno (i-node#s) can be reused in the subtree with a different
> >    fsid
> >
> > For NFSv3, the server should export single volumes only (all
> > objects
> > have the same fsid and the filenos are unique). This is indicated
> > to
> > the VFS by the use of the NOCROSSMOUNT flag on VOP_LOOKUP() and
> > friends.
> >
> > For NFSv4, the server does export multiple volumes and the boundary
> > is indicated by a change in fsid value.
> >
> > I suspect ZFS snaphots don't obey the above in some way, but that
> > is
> > just a hunch.
> >
> > Now, how to narrow this down...
> > - Do the above tests (both NFSv4 and NFSv3) and capture the
> > packets,
> >    then look at them in wireshark. In particular, look at the
> >    fileid numbers
> >    and fsid values for the various directories under .zfs.
> 
> I gave this a shot, but I haven't used wireshark to capture NFS
> traffic
> before, so if I need to provide additional details, let me know..
> 
> NFSv4:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid4.major=1446349656
> fsid4.minor=222
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid4.major=1845998066
> fsid4.minor=222
> 
> For /mnt/jas:
> fileid=144
> fsid4.major=597946950
> fsid4.minor=222
> 
> For /mnt/jas1:
> fileid=338
> fsid4.major=597946950
> fsid4.minor=222
> 
> So fsid is the same for all the different "data" directories, which
> is
> what I would expect given what you said.  I  guess each snapshot is
> seen
> as a unique filesystem...  but then a repeating inode in different
> filesystems shouldn't be a problem...
> 
> NFSv3:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid=0x0000000056358b58
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid=0x000000006e07b1f2
> 
> For /mnt/jas
> fileid=144
> fsid=0x0000000023a3f246
> 
> For /mnt/jas1:
> fileid=338
> fsid=0x0000000023a3f246
> 
> Here, it seems it's the same, even though it's NFSv3... hmm.
> 
> 
> > - Try mounting the individual snapshot directory, like
> >     .zfs/snapshot/20131209 and see if that works (for both NFSv3
> >     and NFSv4).
> 
> Hmm .. I tried this:
> 
> /local/backup/home9/.zfs/snapshot/20131203  -ro
> archive-mrpriv.cs.yorku.ca
> V4: /
> 
> ... but syslog reports:
> 
> Dec 10 22:28:22 jungle mountd[85405]: can't export
> /local/backup/home9/.zfs/snapshot/20131203
> 
> ... and of course I can't mount from either v3/v4.
> 
> On the other hand, I kept it as:
> 
> /local/backup/home9 -ro archive-mrpriv.cs.yorku.ca
> V4:/
> 
> ... and was able to NFSv4 mount
> /local/backup/home9/.zfs/snapshot/20131203, and this does indeed
> work.
> 
> > - Try doing the mounts with a FreeBSD client and see if you get the
> > same
> >    behaviour?
> I found this:
> http://forums.freenas.org/threads/mounting-snapshot-directory-using-nfs-from-linux-broken.6060/
> .. implies it will work from FreeBSD/Nexenta, just not Linux.
> Found this as well:
> https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/lKyfYsjPMNM
> 
> Jason.
> 
> 


More information about the freebsd-fs mailing list