Re: support for pNFS with Linux as Data Servers

From: David Chen <david.chen_at_peakaio.com>
Date: Thu, 22 May 2025 04:53:36 UTC
> I don't think it will be a lot of work. You'll notice that the RPC functions
> in nfs_clrpcops.c mostly handle NFSv2, NFSv3 and NFSv4. Changing
> these to handle NFSv3 shouldn't be a big deal. (Admittedly a lot easier
> for me to do, since I know how the code needs to be written.)
> The one that will look quite different is NFSv3 Create instead of NFSv4
> Open/Create.

OK, thanks, I'll look into this!

> I do see trying to do loosely coupled to Linux DS servers as a lot of
> work since, as I noted before, the MDS needs to manage Open stateids
> for all the DS files.

Sorry for being dense, but to lay out my understanding so far: today
with tightly coupled, we use NFSv4 RPCs to talk to the DS(s) from the
MDS, and since it's tightly coupled, we can use a 0x5555 stateid and
avoid managing stateids. With loosely coupled, we can't use a 0x5555
stateid and would need to manage stateids if we continue to use NFSv4
for this communication. So as you said we should just use NFSv3
instead, which has no stateids to manage.

Here I think you're talking about, when a NFS client talks to a DS
e.g. to write to a file, if that communication is NFSv4, then the MDS
must have first told the client what stateid to use (sent as part of
the layout), and the management of that stateid (which originally came
from the DS) is complicated. If that's what you're saying, then that
makes sense to me too. If we avoid NFSv4 when the MDS sends RPCs to
the DS(s) by using NFSv3 instead, and if we specify in the
GETDEVICEINFO only NFSv3 and not NFSv4, then would we avoid managing
any stateids?

> The problem is when the client has already open'd the file and acquired
> a rw layout for it (unlike a POSIX file system, NFS servers check permissions
> on every I/O operation).

Ahh, OK, thanks! I didn't realize with NFS the permission must be
checked on every I/O, I assumed the POSIX behavior.

> I can't recall if the CB_LAYOUTRECALL exercise is already done
> for the tightly coupled case?

I don't see that it's been done, but I could easily be missing it.

I tried changing permissions when a client already has a file open,
and got the following bad(?) behavior, using a completely stock
FreeBSD pNFS server and a completely stock Linux client, but probably
I made a mistake somewhere in the pNFS configuration or my testing:

I configured pNFS using the instructions in pnfsserver(4). From a
Linux client with two users ("userone" and "usertwo") both in the
group "users", I did:

1) Create a file "testfile" with mode 664, ownership userone:users.
2) Open the file for writing as usertwo.
3) Change permissions to 644.
4) Write to the opened file.

After step 4, the Linux NFS client gets stuck in a loop of WRITE
(NFS4ERR_ACCESS), LAYOUTERROR (OK), LAYOUTRETURN (OK), 5 second pause,
LAYOUTGET (OK), repeat. The client seems to be in a bad state at this
point, e.g. if I unmount and remount the NFS share then the mount
hangs.

If I do the same steps with a FreeBSD client instead of a Linux one, I
get the expected behavior, i.e. the write() does not successfully
write but returns success, when I close the file I get an error, and
the NFS client stays in a good state.

Probably the Linux client case is supposed to behave the same as the
FreeBSD client case, instead of getting stuck in a loop, and I've done
something wrong?

In general, I'm confused that, assuming a client is allowed to use the
same layout for both userone and usertwo in the example above, even if
the layout is recalled and presumably a new layout issued, I don't see
how a single layout can result in allowing write access for userone
but denying access for usertwo. I can see that if all writes are
directed through the MDS, then the MDS can enforce the access on each
write, but I assume that would be a transient situation. Basically,
fencing makes sense to me at the granularity of clients, but I don't
see how fencing works when the issue at hand is controlling access at
the granularity of users. I'm probably making more bad assumptions,
just wish I knew what they are. Thanks!!