Unstable NFS on recent CURRENT
Rick Macklem
rmacklem at uoguelph.ca
Thu Mar 10 02:00:28 UTC 2016
Paul Mather wrote:
> On Mar 8, 2016, at 7:49 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>
> > Paul Mather wrote:
> >> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmacklem at uoguelph.ca> wrote:
> >>
> >>> Paul Mather (forwarded by Ronald Klop) wrote:
> >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather
> >>>> <paul at gromit.dlib.vt.edu>
> >>>> wrote:
> >>>>
> >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
> >>>>> having trouble with NFS. I have been doing a buildworld and
> >>>>> buildkernel
> >>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has
> >>>>> resulted in the buildworld failing at some point, with a variety of
> >>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR"
> >>>>> of /usr/src doesn't manage to complete. It errors out thus:
> >>>>>
> >>>>> =====
> >>>>> [[...]]
> >>>>> total 0
> >>>>> ls: ./.svn/pristine/fe: Permission denied
> >>>>>
> >>>>> ./.svn/pristine/ff:
> >>>>> total 0
> >>>>> ls: ./.svn/pristine/ff: Permission denied
> >>>>> ls: fts_read: Permission denied
> >>>>> =====
> >>>>>
> >>>>> On the console, I get the following:
> >>>>>
> >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
> >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
> >>>>> MIDDLEWARE)
> >>>>>
> > Oh, I had forgotten this. Here's the comment related to this error.
> > (about line#445 in sys/fs/nfsclient/nfs_clport.c):
> > 446 * BROKEN NFS SERVER OR MIDDLEWARE
> > 447 *
> > 448 * Certain NFS servers (certain old proprietary filers
> > ca.
> > 449 * 2006) or broken middleboxes (e.g. WAN accelerator
> > products)
> > 450 * will respond to GETATTR requests with results for a
> > 451 * different fileid.
> > 452 *
> > 453 * The WAN accelerator we've observed not only serves
> > stale
> > 454 * cache results for a given file, it also
> > occasionally serves
> > 455 * results for wholly different files. This causes
> > surprising
> > 456 * problems; for example the cached size attribute of
> > a file
> > 457 * may truncate down and then back up, resulting in
> > zero
> > 458 * regions in file contents read by applications. We
> > observed
> > 459 * this reliably with Clang and .c files during
> > parallel build.
> > 460 * A pcap revealed packet fragmentation and GETATTR
> > RPC
> > 461 * responses with wholly wrong fileids.
> >
> > If you can connect the client->server with a simple switch (or just an RJ45
> > cable), it
> > might be worth testing that way. (I don't recall the name of the middleware
> > product, but
> > I think it was shipped by one of the major switch vendors. I also don't
> > know if the product
> > supports NFSv4?)
> >
> > rick
>
>
> Currently, the client is connected to the server via a dumb gigabit switch,
> so it is already fairly direct.
>
> As for the above error, it appeared on the console only once. (Sorry if I
> made it sound like it appears every time.)
>
> I just tried another buildworld attempt via NFS and it failed again. This
> time, I get this on the BeagleBone Black console:
>
> nfs_getpages: error 13
> vm_fault: pager read error, pid 5401 (install)
>
13 is EACCES and could be caused by what I mention below. (Any mount of a file
system on the server unless "-S" is specified as a flag for mountd.)
>
> The other thing I have noticed is that if I induce heavy load on the NFS
> server---e.g., by starting a Poudriere bulk build---then that provokes the
> client to crash much more readily. For example, I started a NFS buildworld
> on the BeagleBone Black, and it seemed to be chugging along nicely. The
> moment I kicked off a Poudriere build update of my packages on the NFS
> server, it crashed the buildworld on the NFS client.
>
Try adding "-S" to mountd_flags on the server. Any time file systems are mounted
(and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd to
reload /etc/exports. When /etc/exports are being reloaded, there will be access
errors for mounts (that are temporarily not exported) unless you specify "-S"
(which makes mountd suspend the nfsd threads during the reload of /etc/exports).
rick
> I have had problems with swap on FreeBSD/arm before. Swapping to a file does
> not appear to work for me. As a result, I switched to swapping to a
> partition on the SD card. Maybe this is unreliable, too?
>
> Cheers,
>
> Paul.
>
>
More information about the freebsd-fs
mailing list