Mounting NFSv4 as root fs

Sat Feb 19 00:00:40 UTC 2011

On 02/19/11 08:38, Rick Macklem wrote:
>> Hi Rick,
>>
>> I've set up a NFS server to pxeboot a set of testbed clients from. The
>> server filesystem tree the client needs to use as its root has nullfs
>> mounted directories in it. Therefore, NFSv4 is the only useful way to
>> mount it on the client because of the cross mount point traversing
>> capabilities built into v4. I've verified that I can "mount_nfs -o
>> nfsv4
>> ..." on the command line and see all the files in the tree so I have
>> things working fine on the server side.
>>
>> I was aware our pxeboot only supports NFSv3, but hoped that by
>> specifying "newfs" and "nfsv4" in the fstype and options fields
>> respectively in fstab that things might just work when the mount root
>> step after the kernel boot happens. It doesn't as I found out, because
>> of two issues:
>>
>> 1. I believe there is a bug in the newnfs code. nfs_diskless.c wasn't
>> copied from the old nfsclient and suitably modified for use with
>> newnfs.
>> As a result during boot, the ncl_mountroot() function in
>> nfs_clvfsops.c
>> calls nfs_setup_diskless() which calls into the old nfs code and
>> badness
>> happens from there on in. I have a patch which fixes this issue,
>> though
>> it may be completely the wrong way to do things as I'm very new (as in
>> 24 hours new) to the NFS code.
>>
> Yep. I didn't see an easy way to set up the diskless root so that it would
> work for both clients concurrently, so I was planning on switching it if/when
> "newnfs" becomes the default client. (You can switch fairly easily. Just
> crib the code across, as it sounds like you have and then make sure the
> xxx_mountroot() in "newnfs" gets called instead of nfs_mountroot() in the
> other one.

Yes that's exactly what I did.

> However, that will just get a "newnfs" NFSv3 root mount to work.

Yup, confirmed working as expected (mount output shows "newnfs" for /
whereas before it would fall back to "nfs" after the newnfs code crapped
out.

>> 2. pxeboot stores the filehandle and filehandle length it used to grab
>> the kernel via NFS in the kernel's env and after the kernel has
>> booted,
>> it looks for these variables and reuses them i.e. at no point in the
>> process does the code attempt to upgrade to NFSv4 if the bootstrap
>> uses
>> NFSv3 to grab the kernel.
>>
>> For my particular use case, I'm quite happy for the kernel to be
>> pulled
>> via NFSv3, but can't boot the client without somehow getting the
>> client
>> to switch to NFSv4 at the point where it mount's root after the kernel
>> has finished booting.
>>
>> I tried a very hacky test in mountnfs() in nfs_clvfsops.c to see if I
>> could set the NFSV4 flag, unset the V3 flag and tell the code to
>> forget
>> about the cached file handle set by the loader just to see if the code
>> would try to renegotiate using v4... it crashed and burned.
>>
> The same file handle should work for NFSv4 (at least a FReeBSD server
> generates the same FH for a v3 vs v4 mount).

Ah, interesting and good to know, thanks. So assuming the server is v4
capable, you can just start issuing v4 RPCs to the handle established by
pxeboot and things should keep working?

>> So, before I spend any more time on this, I hope to get your (or
>> anyone
>> else reading for that matter) thoughts on how best to proceed. Some
>> questions:
>>
>> - Could you guesstimate how much work is involved to get v4 support
>> into
>> libstand so that pxeboot can talk v4 natively? I spent quite some time
>> poking at libstand's code last night but don't understand the NFSv4
>> RPC
>> mechanism enough to attempt writing the basic code to do it yet. The
>> RFC
>> explains the ordering of OPs needed quit well but I don't quite grok
>> how
>> the data structures for interpreting responses work.
>>
> Lots. It will be easier to get the kernel to use v4 after pxeboot has
> loaded it via v3.

ACK.

>> - Can you think of a hacky simple way to force my client to
>> renegotiate
>> the mount as v4 at the time mount root happens?
>>
> If you are will to spend man weeks on this, you can probably get
> something to work for your lab (useless for others, because you'll
> have to hard wire a bunch of stuff into the kernel like your DNS
> domain name...).
> 
> I have never intended to try and make an NFSv4 root mount work.
> (Someone said NFSv4 is NFS in name only:-)
> 
> One of the most difficult parts will be the uid/gid<->name mapping.
> You would have to hack this enough so that it worked without nfsuserd.
> Something like hard wiring mappings into the kernel cache for enough
> entries that the root works. (Note that names look like root at cis.uoguelph.ca,
> so it needs to know the DNS domain as well as "root" == uid 0.)
> Then hopefully you don't need other mappings to work, because it would
> have to work without nfsuserd running and with nfsuserd running (in the
> root fs).
> 
> Short answer. A severely hacked kernel might work for your lab, but a
> generic solution for FreeBSD would be very difficult.

Thanks heaps for the brain dump, it really helps put things in
perspective. It's sounding like a much bigger job than I thought it
would be, even for a hacked up lab-only solution.

> If you could move the "nullfs" mounts down a level, so the NFSv4 mount
> was below an NFSv3 root fs, that would be much easier.

Agreed. The issue is we're using the ezjail management script from ports
to manage the bootable client filesystems on the server, and it uses
nullfs mounts between a base filesystem and the client filesystems to
avoid duplicating all the utilities/libs in /bin, /sbin, /lib and
/libexec multiple times. Works well but not for this use case... oh well.

I guess it will be significantly easier to hack ezjail to just copy the
dirs from the basejail into each client rather than try get the all
singing all dancing NFSv4 option going.

Thanks again for your insights.

Cheers,
Lawrence