make the experimental NFS subsystem the default one

Bruce Evans brde at optusnet.com.au
Thu Apr 21 23:55:53 UTC 2011


On Thu, 21 Apr 2011, Rick Macklem wrote:

>> I've hacked up the old NFS server a bit to better serve my ESX/NFS
>> needs (forcing it to always write async so it works faster with ZFS
>> and a ZIL). I guess I could do the same to the new code as well...
>>
>> ..However, if you didn't mind adding switch to force the NFS server to
>> quietly only do an async VOP_WRITE, that would be really handy. Very
>> dangerous for some people, but then again, so are most switches if you
>> don't understand what they do. ESX will only opens it's files with
>> O_SYNC, and it's not an editable option, so people like me running
>> ESX+NFS over a ZFS share that has a ZIL are making the system do a lot
>> of unnecessary sync writes. Yes, I know about SSD/RAM for the ZIL, but
>> it still slows it down quite a bit over an async write.
>
> Have you tries the "async" mount option on the client side?
> (or did you say you couldn't do that above?)

The async mount option should work quite wll for this, except in my
version.  It almost completely breaks O_SYNC (and fsync()).  Thus
applications and nfs have no way to force a sync.  This is mostly fixed
in my version -- O_SYNC and fsync() sync everything related to the
file except possibly its directory entry (and its parent's directory
entry...).  Thus applications and nfs can force a not-quite-complete
sync but doing so gratuitously is unnecessarily slow.

> I would be very hesitant to do it on the server side, since it would
> violate the RFC (and you do run a risk of losing data, which many might
> not realize uptil it is too late).

As you know, nfs has always had problems with syncing things, especially
if the local file system is mounted async.  In [Free]BSD there is just
no way for the server to sync metadata independently of syncing data,
although at least the v3 prootcol supports this.  Syncing everything
using IO_SYNC should work, but is too slow in general, and even IO_SYNC
is defeated by the above brokenness.

In the old nfs server, the code for this is:

>From nfs.h:
% /*
%  * The IO_METASYNC flag should be implemented for local filesystems.
%  * (Until then, it is nothin at all.)
%  */
% #ifndef IO_METASYNC
% #define IO_METASYNC	0
% #endif

>From nfs_serv.c:
% 
% 	    /*
% 	     * XXX
% 	     * The IO_METASYNC flag indicates that all metadata (and not just
% 	     * enough to ensure data integrity) mus be written to stable storage
% 	     * synchronously.
% 	     * (IO_METASYNC is not yet implemented in 4.4BSD-Lite.)
% 	     */
% 	    if (stable == NFSV3WRITE_UNSTABLE)
% 		ioflags = IO_NODELOCKED;
% 	    else if (stable == NFSV3WRITE_DATASYNC)
% 		ioflags = (IO_SYNC | IO_NODELOCKED);
% 	    else
% 		ioflags = (IO_METASYNC | IO_SYNC | IO_NODELOCKED);

The experimental nfs server seems to have even less support for this:

>From nfs_nfsdport.c
% 	if (stable == NFSWRITE_UNSTABLE)
% 		ioflags = IO_NODELOCKED;
% 	else
% 		ioflags = (IO_SYNC | IO_NODELOCKED);

Since IO_METASYNC is 0, the new code is equivalent to the old code, but the
old code would work better if IO_METASYNC actually worked.  OTOH, IO_SYNC
implies syncing metadata in FreeBSD, so IO_METASYNC is needed for something
quite different -- to sync metadata when data is _not_ being synced.  This
should be done by a state between completely unstable and unstable-data.

nfs is basically assuming that metadata is sync by default.  This is the
case with old ffs, but not always:
- ffs with soft updates.  Sync of metadata is delayed, and might not happen
   due to a crash.  Only integrity of the file system is guaranteed.  This
   rarely matters.  It might matter if an application or nfs client thinks
   it has completed written a critical chown().
- ffs with async mounts.  Now nothing is synced by default, and unless you
   have fixed it, almost nothing is synced by IO_SYNC.  It would be useful
   for IO_METASYNC to sync all the metadata.  This would give the same
   stability as the default for old ffs (sync metadata, async data).
- msdosfs with with defaults.  Now the most critical metadata (the FAT)
   is async, while less critical metadata (directory entries and pseudo-
   inodes) are sync.  Sync FAT would be too slow, so it takes a sync mount
   to get it.  Again IO_METASYNC (applied to the file and enough of its
   surrounding metadata) would be useful for giving almost the same
   stability as old ffs.

The nfs protocol seems to be inadequate for handling various combinations
of asyncness on the client and the server.  E.g., suppose that the server
file system is mounted async.  You probably want most operations to be
async.  But clients know nothing of asyncness on servers, and the spec
requires some operations to be sync, so clients will issue sync operations
which break the default of asyncness unless the server dishonors IO_SYNC.
But you want some operations like fsync() on the client to be sync, so
you don't want the server to dishonor all IO_SYNCs.  One way to control
this would be to make async mounts on the client work (async mounts on
clients are now handled bogusly by accepting them in mount options
although they have no effect):

     async client, async server:
         client only asks for sync operations when the application asks for
         them; server uses default which keeps other operations async;
         server should honor IO_SYNC otherwise

     default client, async server:
         same as now -- client gets server's defaults even when they involve
         dishonoring IO_SYNC

     async client, default or sync server:
 	client should have precedence, giving async for everything.
         Need another protocol stability value and more server support for it,
         so that the server doesn't use its configuration of sync for at
         least metadata, but uses the client's configuration of async
         everything.  The NFSWRITE_UNSTABLE value permits but doesn't request
         the server to be unstable, but with an async client we want to
         request the server to be unstable :-) (really just to optimize for
         speed instead of stability)

     default client, default server:
         some combination of the client and server's defaults.  Must include
         honoring the protocol and IO_SYNC and fsync() on the client

     sync client, async server:
 	client should have precedence, giving sync for everything.  Same as
 	now, but quite broken if the server dishonors IO_SYNC

     sync client or server, default other side:
 	as for async, explicitly asking for sync has precedence over
 	defaults

When I started writing the above (consolidating old ideas), I thought
that the client would need to negotiate the [a]syncness with the server,
but I now see that several useful cases can be handled by the client
just expressing its preferences for async by not asking for FILESYNC or
DATASYNC.

Grepping for NFS.*WRITE_UNSTABLE in the new nfs client and server shows that
it is now spelled without V3, except 3 comments have the old spelling.

Bruce


More information about the freebsd-fs mailing list