Mounting NFSv4 as root fs

Fri Feb 18 21:38:10 UTC 2011

> Hi Rick,
> 
> I've set up a NFS server to pxeboot a set of testbed clients from. The
> server filesystem tree the client needs to use as its root has nullfs
> mounted directories in it. Therefore, NFSv4 is the only useful way to
> mount it on the client because of the cross mount point traversing
> capabilities built into v4. I've verified that I can "mount_nfs -o
> nfsv4
> ..." on the command line and see all the files in the tree so I have
> things working fine on the server side.
> 
> I was aware our pxeboot only supports NFSv3, but hoped that by
> specifying "newfs" and "nfsv4" in the fstype and options fields
> respectively in fstab that things might just work when the mount root
> step after the kernel boot happens. It doesn't as I found out, because
> of two issues:
> 
> 1. I believe there is a bug in the newnfs code. nfs_diskless.c wasn't
> copied from the old nfsclient and suitably modified for use with
> newnfs.
> As a result during boot, the ncl_mountroot() function in
> nfs_clvfsops.c
> calls nfs_setup_diskless() which calls into the old nfs code and
> badness
> happens from there on in. I have a patch which fixes this issue,
> though
> it may be completely the wrong way to do things as I'm very new (as in
> 24 hours new) to the NFS code.
> 
Yep. I didn't see an easy way to set up the diskless root so that it would
work for both clients concurrently, so I was planning on switching it if/when
"newnfs" becomes the default client. (You can switch fairly easily. Just
crib the code across, as it sounds like you have and then make sure the
xxx_mountroot() in "newnfs" gets called instead of nfs_mountroot() in the
other one.

However, that will just get a "newnfs" NFSv3 root mount to work.

> 2. pxeboot stores the filehandle and filehandle length it used to grab
> the kernel via NFS in the kernel's env and after the kernel has
> booted,
> it looks for these variables and reuses them i.e. at no point in the
> process does the code attempt to upgrade to NFSv4 if the bootstrap
> uses
> NFSv3 to grab the kernel.
> 
> For my particular use case, I'm quite happy for the kernel to be
> pulled
> via NFSv3, but can't boot the client without somehow getting the
> client
> to switch to NFSv4 at the point where it mount's root after the kernel
> has finished booting.
> 
> I tried a very hacky test in mountnfs() in nfs_clvfsops.c to see if I
> could set the NFSV4 flag, unset the V3 flag and tell the code to
> forget
> about the cached file handle set by the loader just to see if the code
> would try to renegotiate using v4... it crashed and burned.
> 
The same file handle should work for NFSv4 (at least a FReeBSD server
generates the same FH for a v3 vs v4 mount).

> So, before I spend any more time on this, I hope to get your (or
> anyone
> else reading for that matter) thoughts on how best to proceed. Some
> questions:
> 
> - Could you guesstimate how much work is involved to get v4 support
> into
> libstand so that pxeboot can talk v4 natively? I spent quite some time
> poking at libstand's code last night but don't understand the NFSv4
> RPC
> mechanism enough to attempt writing the basic code to do it yet. The
> RFC
> explains the ordering of OPs needed quit well but I don't quite grok
> how
> the data structures for interpreting responses work.
> 
Lots. It will be easier to get the kernel to use v4 after pxeboot has
loaded it via v3.

> - Can you think of a hacky simple way to force my client to
> renegotiate
> the mount as v4 at the time mount root happens?
> 
If you are will to spend man weeks on this, you can probably get
something to work for your lab (useless for others, because you'll
have to hard wire a bunch of stuff into the kernel like your DNS
domain name...).

I have never intended to try and make an NFSv4 root mount work.
(Someone said NFSv4 is NFS in name only:-)

One of the most difficult parts will be the uid/gid<->name mapping.
You would have to hack this enough so that it worked without nfsuserd.
Something like hard wiring mappings into the kernel cache for enough
entries that the root works. (Note that names look like root at cis.uoguelph.ca,
so it needs to know the DNS domain as well as "root" == uid 0.)
Then hopefully you don't need other mappings to work, because it would
have to work without nfsuserd running and with nfsuserd running (in the
root fs).

Short answer. A severely hacked kernel might work for your lab, but a
generic solution for FreeBSD would be very difficult.

If you could move the "nullfs" mounts down a level, so the NFSv4 mount
was below an NFSv3 root fs, that would be much easier.

rick