testing/review of atomic export update patch

Rick Macklem rmacklem at uoguelph.ca
Fri Sep 21 21:11:46 UTC 2012


Konstantin Belousov wrote:
> On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote:
> > Konstantin Belousov wrote:
> > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> > > > Konstantin Belousov wrote:
> > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > > > > Konstantin Belousov wrote:
> > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem
> > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > There is a simple patch at:
> > > > > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > > > > that can be applied to a kernel + mountd, so that the
> > > > > > > > new
> > > > > > > > nfsd can be suspended by mountd while the exports are
> > > > > > > > being
> > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable
> > > > > > > > this.
> > > > > > > > (This avoids the long standing bug where clients receive
> > > > > > > > ESTALE
> > > > > > > >  replies to RPCs while mountd is reloading exports.)
> > > > > > >
> > > > > > > This looks simple, but also somewhat worrisome. What would
> > > > > > > happen
> > > > > > > if the mountd crashes after nfsd suspension is requested,
> > > > > > > but
> > > > > > > before
> > > > > > > resume was performed ?
> > > > > > >
> > > > > > > Might be, mountd should check for suspended nfsd on start
> > > > > > > and
> > > > > > > unsuspend
> > > > > > > it, if some flag is specified ?
> > > > > > Well, I think that happens with the patch as it stands.
> > > > > >
> > > > > > suspend is done if the "-S" option is specified, but that is
> > > > > > a
> > > > > > no op
> > > > > > if it is already suspended. The resume is done no matter
> > > > > > what
> > > > > > flags
> > > > > > are provided, so mountd will always try and do a "resume".
> > > > > > --> get_exportlist() is always called when mountd is started
> > > > > > up
> > > > > > and
> > > > > >     it does the resume unconditionally when it completes.
> > > > > >     If mountd repeatedly crashes before completing
> > > > > >     get_exportlist()
> > > > > >     when it is started up, the exports will be all messed
> > > > > >     up, so
> > > > > >     having the nfsd threads suspended doesn't seem so bad
> > > > > >     for
> > > > > >     this
> > > > > >     case (which hopefully never happens;-).
> > > > > >
> > > > > > Both suspend and resume are just no ops for unpatched
> > > > > > kernels.
> > > > > >
> > > > > > Maybe the comment in front of "resume" should explicitly
> > > > > > explain
> > > > > > this, instead of saying resume is harmless to do under all
> > > > > > conditions?
> > > > > >
> > > > > > Thanks for looking at it, rick
> > > > > I see.
> > > > >
> > > > > My another note is that there is no any protection against
> > > > > parallel
> > > > > instances of suspend/resume happen. For instance, one thread
> > > > > could
> > > > > set
> > > > > suspend_nfsd = 1 and be descheduled, while another executes
> > > > > resume
> > > > > code sequence meantime. Then it would see suspend_nfsd != 0,
> > > > > while
> > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems
> > > > > that
> > > > > nfsv4_unlock would silently exit. The suspending thread
> > > > > resumes,
> > > > > and obtains the lock. You end up with suspend_nfsd == 0 but
> > > > > lock
> > > > > held.
> > > > Yes. I had assumed that mountd would be the only thing using
> > > > these
> > > > syscalls
> > > > and it is single threaded. (The syscalls can only be done by
> > > > root
> > > > for the
> > > > obvious reasons.;-)
> > > >
> > > > Maybe the following untested version of the syscalls would be
> > > > better, since
> > > > they would allow multiple concurrent calls to either suspend or
> > > > resume.
> > > > (There would still be an indeterminate case if one thread called
> > > > resume
> > > >  concurrently with another few calling suspend, but that is
> > > >  unavoidable,
> > > >  I think?)
> > > >
> > > > Again, thanks for the comments, rick
> > > > --- untested version of syscalls ---
> > > > 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd == 0) {
> > > > 			/* Lock out all nfsd threads */
> > > > 			igotlock = 0;
> > > > 			while (igotlock == 0 && suspend_nfsd == 0) {
> > > > 				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
> > > > 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> > > > 			}
> > > > 			suspend_nfsd = 1;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd != 0) {
> > > > 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> > > > 			suspend_nfsd = 0;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	}
> > >
> > > From the cursory look, this variant is an improvement, mostly by
> > > taking
> > > the interlock before testing suspend_nfsd, and using the while
> > > loop.
> > >
> > > Is it possible to also make the sleep for the lock interruptible ?
> > > So that blocked mountd could be killed by a signal ?
> > Well, it would require some coding. An extra argument to
> > nfsv4_lock()
> > to indicate to do so and then either the caller would have to check
> > for a pending termination signal when it returns 0 (indicates didn't
> > get
> > lock) or a new return value to indicate EINTR. The latter would
> > require
> > all the calls to it to be changed to recognize the new 3rd return
> > case.
> > Because there are a lot of these calls, I'd tend towards just having
> > the
> > caller check for a pending signal.
> >
> > Not sure if it would make much difference though. The only time it
> > would get stuck in nfsv4_lock() is if the nfsd threads are all
> > wedged
> > and in that case having mountd wedged too probably doesn't make much
> > difference, since the NFS service is toast in that case anyhow.
> >
> > If you think it is worth doing, I can add that. I basically see this
> > as a "stop-gap" fix until such time as something like nfse is done,
> > but since I haven't the time to look at nfse right now, I have no
> > idea when/if that might happen.
> 
> Ok, please go ahead with the patch. Having the patch even in its
> current
> form is obviously better then not to have it. If the wedged mountd
> appears to be annoying enough for me, I would do the change.
> 
Ah, that's ok, I'll do it. I'll do it as a separate commit first,
since I can't see it being controversial. I'll cobble to-gether a
version of the atomic-export patch using it after that.

Have a good weekend, rick

> Thanks.


More information about the freebsd-fs mailing list