FreeBSD 10.1 can't "make -j5 buildworld" over NFS?

Rick Macklem rmacklem at uoguelph.ca
Sun Apr 19 12:29:28 UTC 2015


J David wrote:
> On identical hardware, against the exact same NFS server, FreeBSD 9.3
> can do a parallel buildworld, but FreeBSD 10.1 dies in cleandir with a
> bunch of "stale NFS file handle" errors.
> 
> The mount options are the same on both clients:
> 
> 192.168.20.161:/data/software/freebsd/releng-9.3/src /usr/src nfs
> rw,tcp,nfsv3,noauto 0 0
> 192.168.20.161:/data/software/freebsd/releng-9.3/amd64/obj /usr/obj
> nfs rw,tcp,nfsv3,noauto 0 0
> 
> 
> 192.168.20.161:/data/software/freebsd/releng-10.1/src /usr/src nfs
> rw,tcp,nfsv3,noauto 0 0
> 192.168.20.161:/data/software/freebsd/releng-10.1/amd64/obj /usr/obj
> nfs rw,tcp,nfsv3,noauto 0 0
[rest clipped for brevity]

I checked and I was incorrect w.r.t. "make" changing. One thing you could
try (although you said you weren't going to do anything on your last post)
is disabling lookup using shared vnode locks.
# sysctl vfs.lookup_shared=0
and see if that stops it from failing with ESTALE.

Here's a comment from the NFS client code nfs_remove() (been there for quite a while):
1674   /*
1675 	* Purge the name cache so that the chance of a lookup for
1676 	* the name succeeding while the remove is in progress is
1677 	* minimized. Without node locking it can still happen, such
1678 	* that an I/O op returns ESTALE, but since you get this if
1679 	* another host removes the file..
1680 	*/
I don`t believe I wrote this comment, but my understanding is that a second thread
may succeed in looking up the file (hit on the name cache) while the remove is in
progress and then attempt the remove again. Disabling shared vnode locking (forcing
the lookup that preceeds the remove to acquire an exclusive lock on the directory
might avoid the race.

My comment w.r.t. NFS not being POSIX compliant wasn`t meant to say that this
problem wasn`t fixable or shouldn`t be fixed, it was meant to imply that working
on a POSIX file system doesn`t imply working over NFS.

Since FreeBSD9.3 also has shared vnode locking enabled for lookups (unless you
disabled them), I don`t know why 10.1 would break and 9.3 doesn`t.

rick


More information about the freebsd-fs mailing list