Under what circumstances does the new NFS client return EAGAIN?
    Andrey Simonenko 
    simon at comsys.ntu-kpi.kiev.ua
       
    Tue Apr 24 12:17:02 UTC 2012
    
    
  
On Thu, Mar 01, 2012 at 12:36:15PM -0500, Garrett Wollman wrote:
> <<On Wed, 29 Feb 2012 22:13:47 -0500 (EST), Rick Macklem <rmacklem at uoguelph.ca> said:
> 
> > Unfortunately it is a well known issue that updating exports
> > is not done atomically. (I had a patch that suspended the nfsd
> > threads while exports were being updated, but it was felt to
> > be risky and zack@ was going to come up with a patch to fix this,
> > but I don't think he has committed anything.)
> 
> That might be something that we at least would need.  You don't need
> to suspend all of the nfsd threads, just delay responding to any
> request that fails access control until the filter programming is
> done.  We may actually need to do something like that, if this machine
> is to be usable as a file server.  (Can't have our users' jobs
> randomly breaking just because an administrator mounted a new
> filesystem.)
There are two types of NFS export settings handling in NFS servers:
1. All NFS export settings are loaded into NFS server, so it can make
   decisions about exports itself.  All address specifications are given
   by addresses and netmasks (does not matter whether they were given
   by explicit addresses or by domain names in configuration files).
2. All NFS export settings are kept in user space.  NFS server has cache
   of settings for clients' addresses and asks user space program if
   cache does not have NFS export information for some client's address.
   Such approach allows to specify export as wildcard domain names.
When export settings are updated, for the first case it is necessary to
update them atomically, so the NFS server will not see partially loaded
settings.  For the second case user land utility can synchronize own vision
of NFS settings, so it just need to flush NFS server export settings cache.
FreeBSD uses the first type.
I already heard about suspending NFS server threads in the kernel,
while NFS exports settings are being parsed and loaded.  Such approach
has few drawbacks: 1) user land program that loads settings can crash,
2) time for loading settings into NFS server is undefined, since data
can be not in RAM.  So, time while NFS server threads are suspended
is undefined.  There are other problems, I'll not describe them here just
to be brief.
Now I want to describe how NFS export settings are loaded into NFS
server in my implementation.  To load export settings into NFS server
the nfssvc(NFSSVC_EXPORT) system call is used.  Settings are not passed
in one system call, so it is not necessary to create one buffer with
all settings (settings can be given as a linked list as well).
All communications with the NFS server through nfssvc(NFSSVC_EXPORT)
system call are made in transaction concept.  A process asks NFS server
to start a new transaction, NFS server creates transaction and informs
a process about transaction ID.  Each transaction is identified by PID,
UID and transaction ID, so several processes can modify NFS server export
settings at the same time.
Each transaction has timeout, to simplify implementation (because only
one transaction is expected) any transaction can be in BUSY, ACTIVE or
INACTIVE state.  One callout with one timeout for all transactions is
used.  If transaction is inactive for some period of time its context
is released, if a process that works with this transaction is still
uses it, then it will be notified about transaction disappearing and
will start a new transaction.
When process loaded all settings into NFS server it performs transaction
commit command and all settings saved in transaction context atomically
are applied to NFS server export settings.  NFS export settings are
protected by r/w locks (rmlock for example).  While NFS exports settings
are being loaded, NFS server verifies whether they can be applied to
the current configuration, so when transaction is committed, all data
structures are already ready and should be just applied to the current
configuration.
To minimize number of nfssvc() system calls, a process can combine
transaction flags and can chose how many different settings should be
send by one nfssvc() system call.  To make ABI interface with NFS server
more flexible for future changes, all settings are passed in so called
command structures and all export specifications are not hard coded in
that structures, instead they are passed by variable sized arrays, so
number and types of specifications can be changed without changing ABI.
When file system is mounted it is not necessary to flush and then load
all settings again, instead only settings for just mounted file system
should be loaded.  The same logic for file system that is going to be
unmounted.  Using SIGHUP signal from mount(8) is wrong, since it will
work only in some cases and it will not work at all if file systems are
mounted by another program.  I used EVFILT_FS VQ_MOUNT and VQ_UNMOUNT
kevents for user land part and new vfs_mount_event and vfs_unmount_event
EVENTHANDLERS for the kernel part.  The kernel part never relies on
information about file system export from user space.  Consider cases
when one file system that shadows NFS exported file systems is mounted
or unmounted.
    
    
More information about the freebsd-fs
mailing list