[Bug 249871] NFSv4 faulty directory listings under heavy load
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Sep 25 02:51:45 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249871
Bug ID: 249871
Summary: NFSv4 faulty directory listings under heavy load
Product: Base System
Version: 12.1-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: jwb at freebsd.org
I think I've discovered a peculiar bug in NFSv4. When the server is under
heavy load, directory listings sometimes show duplicate filenames and other
times omit filenames.
This was discovered when running parallel jobs on a small HPC cluster, each
running xzcat on an NFS-served file, dumping the uncompressed output to a local
disk on the client, followed by some brief heavy computation and writing
several small output files to the NFS server. As shown below, there are 11,031
files processed. Parallel jobs were capped between 50 to 150 at a time, with
the problem occurring with any cap.
All files list-*.txt shown below were produced by
ls | grep 'combined.*-ad\.vcf\.xz'
or
find . -maxdepth 1 'combined.*-ad.vcf.xz'
The file list-1.txt contains the correct directory listing.
list-100.txt, however, contains duplicate filenames, and list-1000.txt has both
duplicate and missing filenames.
# sort list-1.txt | uniq -d
# sort list-100.txt | uniq -d
combined.NWD297242-ad.vcf.xz
combined.NWD745320-ad.vcf.xz
combined.NWD787696-ad.vcf.xz
# wc -l list-1.txt list-100.txt list-1000.txt
11031 list-1.txt
11034 list-100.txt
11027 list-1000.txt
33092 total
# diff list-1.txt list-100.txt
2404a2405
> combined.NWD297242-ad.vcf.xz
7856a7858
> combined.NWD745320-ad.vcf.xz
8391a8394
> combined.NWD787696-ad.vcf.xz
# diff list-1.txt list-1000.txt
153a154
> combined.NWD111306-ad.vcf.xz
170d170
< combined.NWD113182-ad.vcf.xz
512d511
[snip]
If I revert the mounts to NFSv3, the problem goes away (but performance
suffers).
There are no apparent problems delivering file content, just directory
listings. Using this fact, I can work around the problem by writing the
directory listing to a file beforehand, when the server is not under load:
ls | grep 'combined.*-ad\.vcf\.xz' > VCF-list.txt
Reading this file under heavy load does not pose any problems. It's only if I
do a new directory listing with "ls" or "find".
The problem is consistently reproducible under heavy load and does not occur
under light load.
/etc/exports:
V4: /
/etc/zfs/exports:
# !!! DO NOT EDIT THIS FILE MANUALLY !!!
/pxeserver/images -alldirs -ro -network 192.168.0.0 -mask 255.255.128.0
/raid-00 -maproot=root -network 192.168.0.0 -mask 255.255.128.0
/sharedapps -maproot=root -network 192.168.0.0 -mask 255.255.128.0
/usr/home -maproot=root -network 192.168.0.0 -mask 255.255.128.0
/var/cache/pkg -maproot=root -network 192.168.0.0 -mask 255.255.128.0
/etc/fstab on the clients:
login:/usr/home /usr/home nfs rw,bg,intr,noatime 0 0
login:/raid-00 /raid-00 nfs rw,bg,intr,noatime 0 0
login:/sharedapps /sharedapps nfs rw,bg,intr,noatime 0 0
login:/var/cache/pkg /var/cache/pkg nfs rw,bg,intr,noatime 0 0
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list