NFS problem: file doesn't appear in file listing, but can be accessed directly

Attila Nagy bra at fsn.hu
Thu Sep 16 22:15:21 UTC 2010


  Hi,

Sorry for the delay, there was a long vacation and then the usual 
catch-up problems...

On 08/11/2010 05:15 AM, Rick Macklem wrote:
> Ok, so it seems to be a server issue. Since there are quite a few files
> missing, I'm surprised others aren't seeing problmes?
>
> I suspect it is some interaction between ZFS and the NFS server that
> is causing this, but that's just a hunch. Possibly a problem w.r.t.
> how the directory offset cookies are handled.
>
> Can you conveniently move the directory to a UFS2 volume and export
> that, to see if the problem then goes away?
Only to an md or zvol backed one, but that shouldn't affect the result, 
I think.
BTW, all of this stuff is happening while doing maildir directory move 
between servers over NFS (using rsync, yeah I know this sounds silly, 
but we have the reasons for doing that) and checking the results on both 
sides with find and diff.
On the source there is UFS where I can't see this problem.
> Also, what architecture is the server running? (I'm wondering if
> it might be an endianness or 32/64 bit issue related to the
> directory offset cookies.? Just wild guesses at this point.)
amd64 on all machines.
> If you can't move the directory to UFS2 or that doesn't fix
> the problem, all I can think to do is write/run a little program
> locally on the server that does getdirentries() on the directory,
> to try and spot something that might confuse the NFS server. (I can
> write such a program for you, but I'd like to hear if it is a ZFS
> specific problem first.
>
>
Before doing an UFS2 copy, I've copied the problematic directory tree to 
a new location on the same ZFS volume.
Surprise: it's OK! I get the same answer for "find . -type f | wc -l" on 
both the NFS client and directly on the server.
I've tried to copy the directory via NFS with rsync (the way I produced 
this problematic directory and the others), without luck.
I'm confused.

Maybe it would be the best to try to reproduce it with similar usage 
pattern, but it seems to be hard on non real machines. The machine 
itself does heavy IO on the local file system (this is the hard thing to 
reproduce closely), while another machine does rsync over NFS copies 
(but I've tried direct rsync over ssh copies with the same result) and 
then checks the two directories with find (file listing) and diff. This 
process is done on many "threads" (up to 32) and there are occasional 
differences, where the copy stops because the source file listing is not 
the same as on the destination.

The ZFS itself has 35 disks, one log and 3 cache devices.


More information about the freebsd-fs mailing list