[RFC] Should vfs.nfsrv.async be implemented for new NFS server?

Sat Nov 19 16:55:57 UTC 2011

On Sat, 5 Nov 2011, Rick Macklem wrote:

I was away on 5 Nov...

> Josh Paetzel pointed out that vfs.nfsrv.async doesn't exist
> for the new NFS server.

Excellent!  It's existence for the old nfs server is a bug.  The correct
way to enable async is to mount the client async.  nfsclient only has
a tiny amount of support for this, but it seems to be suficient:

% int
% ncl_writerpc(struct vnode *vp, struct uio *uiop, struct ucred *cred,
%     int *iomode, int *must_commit, int called_from_strategy)
% {
% 	...
% 	if (vp->v_mount->mnt_kern_flag & MNTK_ASYNC)
% 		*iomode = NFSWRITE_FILESYNC;
% 	if (error && NFS_ISV4(vp))
% 		error = nfscl_maperr(uiop->uio_td, error, (uid_t)0, (gid_t)0);
% 	return (error);
% }

(The old nfs server is similar.)  As far as I understand this (not very
far), this "works" by pretending that the server completed the sync
even when the server said otherwise.  This is far from being correct,
but a correct version seems too hard since it would require servers
(not just FreeBSD ones) to actually understand asyncness.

The sysctl shouldn't exist because it just causes the same pretense
to happen on the server.  This just applies the hack too globally
-- it now affects all mount instances on all clients.  However,
there may be a problem for nfsv2.  I used to think that v3 clients
were much less hackish than the above, and implemented async mounts
by not asking servers to do a full NFS[V3]WRITE_FILESYNC for async
mounts.  nfsv2 clients can't do that since the nvfs2 protocol is too
simple to be able to request different types and/or levels of syncing.
Now I think that the above hack works for v2 too, but is just a hack.

> I don't think I had spotted this before, but when I looked I
> saw that, when vfs.nfsrv.async is set non-zero in the old server,
> it returns FILESYNC (which means the write has been committed to
> non-volatile storage) even when it hasn't actually done that.
>
> This can improve performance, but has some negative implications:
> - If the server crashes before the write is committed to
>  non-volatile storage, the file modification will be lost.
>  (When a server replies UNSTABLE to a write, the client holds
>   onto the data in its cache and does the write again if the
>   server crashes/reboots before the client does a Commit RPC
>   for the file. However, a reply of FILESYNC tells the client
>   it can forget about the write, because it is done.)
> - Because of the above, replying FILESYNC when the data is not
>  yet committed to non-volatile (also referred to as stable)
>  storage, this is a violation of RFC1813.
>
> I wouldn't want this to be the default, but am willing to
> patch head based on the "backwards compatibility" argument.
> My concern with these types of patches is that some people
> will enable them without realizing the risk of data loss
> that they introduce.
>
> So, how do others feel with respect to whether or not this
> patch should be committed to head?
>
> Thanks in advance for any comments, rick
> ps: Here's the patch, just in case anyone is interested.
> --- fs/nfsserver/nfs_nfsdserv.c.sav	2011-11-05 10:29:38.000000000 -0400
> +++ fs/nfsserver/nfs_nfsdserv.c	2011-11-05 10:37:40.000000000 -0400
> @@ -55,6 +55,11 @@ extern int nfs_rootfhset;
> extern int nfsrv_enable_crossmntpt;
> #endif	/* !APPLEKEXT */
>
> +static int	nfs_async = 0;
> +SYSCTL_DECL(_vfs_nfsd);
> +SYSCTL_INT(_vfs_nfsd, OID_AUTO, async, CTLFLAG_RW, &nfs_async, 0,
> +    "Tell client that writes were synced even though they were not");
> +
> /*
>  * This list defines the GSS mechanisms supported.
>  * (Don't ask me how you get these strings from the RFC stuff like
> @@ -912,7 +917,13 @@ nfsrvd_write(struct nfsrv_descript *nd,
> 			goto out;
> 		NFSM_BUILD(tl, u_int32_t *, 4 * NFSX_UNSIGNED);
> 		*tl++ = txdr_unsigned(retlen);
> -		if (stable == NFSWRITE_UNSTABLE)
> +		/*
> +		 * If nfs_async is set, then pretend the write was FILESYNC.
> +		 * Warning: Doing this violates RFC1813 and runs a risk
> +		 * of data written by a client being lost when the server
> +		 * crashes/reboots.
> +		 */
> +		if (stable == NFSWRITE_UNSTABLE && nfs_async == 0)
> 			*tl++ = txdr_unsigned(stable);
> 		else
> 			*tl++ = txdr_unsigned(NFSWRITE_FILESYNC);

This is for the v3 and v4 cases, the same as in the old nfs server where its
existence is clearly just a bug.  This is missing support for the v2 case
where its existence is not so clearly just a bug.  Here is the code for the
v2 case in the old nfs server:

% int
% nfsrv_write(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
%     struct mbuf **mrq)
% {
% 	...
% 	if (v3) {
% 		tl = nfsm_dissect_nonblock(u_int32_t *, 5 * NFSX_UNSIGNED);
% 		off = fxdr_hyper(tl);
% 		tl += 3;
% 		stable = fxdr_unsigned(int, *tl++);
% 	} else {
% 		tl = nfsm_dissect_nonblock(u_int32_t *, 4 * NFSX_UNSIGNED);
% 		off = (off_t)fxdr_unsigned(u_int32_t, *++tl);
% 		tl += 2;
% 		if (nfs_async)
% 	    		stable = NFSV3WRITE_UNSTABLE;

The v2 case can't tell what the client asked for, so it frobs `stable'
unconditionally here instead of later depending on the value of `stable'.
But perhaps doing nothing here works OK, since the client does the
same frobbing (just as unconditionally even for v3-4).

% 	}
% 	...
% 	if (v3) {
% 		...
% 		/*
% 		 * If nfs_async is set, then pretend the write was FILESYNC.
% 		 */
% 		if (stable == NFSV3WRITE_UNSTABLE && !nfs_async)
% 			*tl++ = txdr_unsigned(stable);
% 		else
% 			*tl++ = txdr_unsigned(NFSV3WRITE_FILESYNC);

Perhaps the hack in the client can be improved slightly by using the same
logic as here -- only converting to FILESYNC if the server said UNSTABLE.
I don't know if other states can actually happen.

Bruce