Unstable NFS on recent CURRENT

Rick Macklem rmacklem at uoguelph.ca
Thu Mar 10 02:00:28 UTC 2016


Paul Mather wrote:
> On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> 
> > Paul Mather wrote:
> >> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> >> 
> >>> Paul Mather (forwarded by Ronald Klop) wrote:
> >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather
> >>>> <paul at gromit.dlib.vt.edu>
> >>>> wrote:
> >>>> 
> >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
> >>>>> having trouble with NFS.  I have been doing a buildworld and
> >>>>> buildkernel
> >>>>> with /usr/src and /usr/obj mounted via NFS.  Recently, this process has
> >>>>> resulted in the buildworld failing at some point, with a variety of
> >>>>> errors (Segmentation fault; Permission denied; etc.).  Even a "ls -alR"
> >>>>> of /usr/src doesn't manage to complete.  It errors out thus:
> >>>>> 
> >>>>> =====
> >>>>> [[...]]
> >>>>> total 0
> >>>>> ls: ./.svn/pristine/fe: Permission denied
> >>>>> 
> >>>>> ./.svn/pristine/ff:
> >>>>> total 0
> >>>>> ls: ./.svn/pristine/ff: Permission denied
> >>>>> ls: fts_read: Permission denied
> >>>>> =====
> >>>>> 
> >>>>> On the console, I get the following:
> >>>>> 
> >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
> >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
> >>>>> MIDDLEWARE)
> >>>>> 
> > Oh, I had forgotten this. Here's the comment related to this error.
> > (about line#445 in sys/fs/nfsclient/nfs_clport.c):
> > 446                      * BROKEN NFS SERVER OR MIDDLEWARE
> > 447 	                 *
> > 448 	                 * Certain NFS servers (certain old proprietary filers
> > ca.
> > 449 	                 * 2006) or broken middleboxes (e.g. WAN accelerator
> > products)
> > 450 	                 * will respond to GETATTR requests with results for a
> > 451 	                 * different fileid.
> > 452 	                 *
> > 453 	                 * The WAN accelerator we've observed not only serves
> > stale
> > 454 	                 * cache results for a given file, it also
> > occasionally serves
> > 455 	                 * results for wholly different files.  This causes
> > surprising
> > 456 	                 * problems; for example the cached size attribute of
> > a file
> > 457 	                 * may truncate down and then back up, resulting in
> > zero
> > 458 	                 * regions in file contents read by applications.  We
> > observed
> > 459 	                 * this reliably with Clang and .c files during
> > parallel build.
> > 460 	                 * A pcap revealed packet fragmentation and GETATTR
> > RPC
> > 461 	                 * responses with wholly wrong fileids.
> > 
> > If you can connect the client->server with a simple switch (or just an RJ45
> > cable), it
> > might be worth testing that way. (I don't recall the name of the middleware
> > product, but
> > I think it was shipped by one of the major switch vendors. I also don't
> > know if the product
> > supports NFSv4?)
> > 
> > rick
> 
> 
> Currently, the client is connected to the server via a dumb gigabit switch,
> so it is already fairly direct.
> 
> As for the above error, it appeared on the console only once.  (Sorry if I
> made it sound like it appears every time.)
> 
> I just tried another buildworld attempt via NFS and it failed again.  This
> time, I get this on the BeagleBone Black console:
> 
> 	nfs_getpages: error 13
> 	vm_fault: pager read error, pid 5401 (install)
> 
13 is EACCES and could be caused by what I mention below. (Any mount of a file
system on the server unless "-S" is specified as a flag for mountd.)

> 
> The other thing I have noticed is that if I induce heavy load on the NFS
> server---e.g., by starting a Poudriere bulk build---then that provokes the
> client to crash much more readily.  For example, I started a NFS buildworld
> on the BeagleBone Black, and it seemed to be chugging along nicely.  The
> moment I kicked off a Poudriere build update of my packages on the NFS
> server, it crashed the buildworld on the NFS client.
> 
Try adding "-S" to mountd_flags on the server. Any time file systems are mounted
(and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd to
reload /etc/exports. When /etc/exports are being reloaded, there will be access
errors for mounts (that are temporarily not exported) unless you specify "-S"
(which makes mountd suspend the nfsd threads during the reload of /etc/exports).

rick

> I have had problems with swap on FreeBSD/arm before.  Swapping to a file does
> not appear to work for me.  As a result, I switched to swapping to a
> partition on the SD card.  Maybe this is unreliable, too?
> 
> Cheers,
> 
> Paul.
> 
> 


More information about the freebsd-fs mailing list