testing/review of atomic export update patch

Konstantin Belousov kostikbel at gmail.com
Wed Sep 19 06:17:05 UTC 2012


On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > Konstantin Belousov wrote:
> > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > > > Hi,
> > > > >
> > > > > There is a simple patch at:
> > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > that can be applied to a kernel + mountd, so that the new
> > > > > nfsd can be suspended by mountd while the exports are being
> > > > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > > > (This avoids the long standing bug where clients receive ESTALE
> > > > >  replies to RPCs while mountd is reloading exports.)
> > > >
> > > > This looks simple, but also somewhat worrisome. What would happen
> > > > if the mountd crashes after nfsd suspension is requested, but
> > > > before
> > > > resume was performed ?
> > > >
> > > > Might be, mountd should check for suspended nfsd on start and
> > > > unsuspend
> > > > it, if some flag is specified ?
> > > Well, I think that happens with the patch as it stands.
> > >
> > > suspend is done if the "-S" option is specified, but that is a no op
> > > if it is already suspended. The resume is done no matter what flags
> > > are provided, so mountd will always try and do a "resume".
> > > --> get_exportlist() is always called when mountd is started up and
> > >     it does the resume unconditionally when it completes.
> > >     If mountd repeatedly crashes before completing get_exportlist()
> > >     when it is started up, the exports will be all messed up, so
> > >     having the nfsd threads suspended doesn't seem so bad for this
> > >     case (which hopefully never happens;-).
> > >
> > > Both suspend and resume are just no ops for unpatched kernels.
> > >
> > > Maybe the comment in front of "resume" should explicitly explain
> > > this, instead of saying resume is harmless to do under all
> > > conditions?
> > >
> > > Thanks for looking at it, rick
> > I see.
> > 
> > My another note is that there is no any protection against parallel
> > instances of suspend/resume happen. For instance, one thread could set
> > suspend_nfsd = 1 and be descheduled, while another executes resume
> > code sequence meantime. Then it would see suspend_nfsd != 0, while
> > nfsv4rootfs_lock not held, and tries to unlock it. It seems that
> > nfsv4_unlock would silently exit. The suspending thread resumes,
> > and obtains the lock. You end up with suspend_nfsd == 0 but lock held.
> Yes. I had assumed that mountd would be the only thing using these syscalls
> and it is single threaded. (The syscalls can only be done by root for the
> obvious reasons.;-)
> 
> Maybe the following untested version of the syscalls would be better, since
> they would allow multiple concurrent calls to either suspend or resume.
> (There would still be an indeterminate case if one thread called resume
>  concurrently with another few calling suspend, but that is unavoidable,
>  I think?)
> 
> Again, thanks for the comments, rick
> --- untested version of syscalls ---
> 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
> 		NFSLOCKV4ROOTMUTEX();
> 		if (suspend_nfsd == 0) {
> 			/* Lock out all nfsd threads */
> 			igotlock = 0;
> 			while (igotlock == 0 && suspend_nfsd == 0) {
> 				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
> 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> 			}
> 			suspend_nfsd = 1;
> 		}
> 		NFSUNLOCKV4ROOTMUTEX();
> 		error = 0;
> 	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
> 		NFSLOCKV4ROOTMUTEX();
> 		if (suspend_nfsd != 0) {
> 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> 			suspend_nfsd = 0;
> 		}
> 		NFSUNLOCKV4ROOTMUTEX();
> 		error = 0;
> 	}

From the cursory look, this variant is an improvement, mostly by taking
the interlock before testing suspend_nfsd, and using the while loop.

Is it possible to also make the sleep for the lock interruptible ?
So that blocked mountd could be killed by a signal ?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20120919/07c923fd/attachment.pgp


More information about the freebsd-fs mailing list