Major issues with nfsv4

Thu Dec 10 16:16:04 UTC 2020

Recently, we attempted to get with the 2000's and try switching from
NFSv3 to NFSv4 on our 12.2 servers.  This has not gone well.

Any system we switch to NFSv4 mounts is functionally unusable, pegged
at 100% system CPU usage, load average 70+, largely from nfscl threads
and client processes using NFS.

Dmesg shows NFS-related messages:

$ dmesg | fgrep -i nfs | sort | uniq -c | sort -n
   1 nfsv4 err=10010
   4 nfsv4 client/server protocol prob err=10026
  29 nfscl: never fnd open

Nfsstat shows no client activity; "nfsstat -e -c 1" and "nfsstat -c 1"
both report:

 GtAttr Lookup Rdlink   Read  Write Rename Access  Rddir
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0
      0      0      0      0      0      0      0      0

Meanwhile, tcpdump on the client shows an endless stream of getattr
requests at the exact same time nfsstat -c says nothing is happening:

$ sudo tcpdump -n -i net1 -c 10 port 2049 and src 172.20.200.39
14:47:27.037974 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [.],
ack 72561, win 545, options [nop,nop,TS val 234259249 ecr 4155804100],
length 0
14:47:27.046282 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 139940:140092, ack 72561, win 545, options [nop,nop,TS val
234259259 ecr 4155804100], length 152: NFS request xid 1544756021 148
getattr fh 0,5/0
14:47:27.051260 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140092:140248, ack 72641, win 545, options [nop,nop,TS val
234259269 ecr 4155804104], length 156: NFS request xid 1544756022 152
getattr fh 0,5/0
14:47:27.063372 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140248:140404, ack 72721, win 545, options [nop,nop,TS val
234259279 ecr 4155804106], length 156: NFS request xid 1544756023 152
getattr fh 0,5/0
14:47:27.068646 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140404:140556, ack 72801, win 545, options [nop,nop,TS val
234259279 ecr 4155804108], length 152: NFS request xid 1544756024 148
getattr fh 0,5/0
14:47:27.080627 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140556:140712, ack 72881, win 545, options [nop,nop,TS val
234259299 ecr 4155804110], length 156: NFS request xid 1544756025 152
getattr fh 0,5/0
14:47:27.085224 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140712:140868, ack 72961, win 545, options [nop,nop,TS val
234259299 ecr 4155804112], length 156: NFS request xid 1544756026 152
getattr fh 0,5/0
14:47:27.096802 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 140868:141024, ack 73041, win 545, options [nop,nop,TS val
234259309 ecr 4155804114], length 156: NFS request xid 1544756027 152
getattr fh 0,5/0
14:47:27.101849 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141024:141180, ack 73121, win 545, options [nop,nop,TS val
234259319 ecr 4155804116], length 156: NFS request xid 1544756028 152
getattr fh 0,5/0
14:47:27.112905 IP 172.20.200.39.727 > 172.20.20.161.2049: Flags [P.],
seq 141180:141336, ack 73201, win 545, options [nop,nop,TS val
234259329 ecr 4155804118], length 156: NFS request xid 1544756029 152
getattr fh 0,5/0

Only 10 shown here for brevity, but:

$ sudo tcpdump -n -i net1 -c 10000 port 2049 and src 172.20.200.39 |
fgrep getattr | wc -l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on net1, link-type EN10MB (Ethernet), capture size 262144 bytes
10000 packets captured
20060 packets received by filter
0 packets dropped by kernel
    9759

There are no dropped packets or network problems:

$ netstat -in -I net1
Name    Mtu Network       Address              Ipkts Ierrs Idrop
Opkts Oerrs  Coll
net1   1500 <Link#2>      12:33:df:5f:79:d7 40988832     0     0
48760307     0     0
net1      - 172.20.0.0/16 172.20.200.39     40942065     -     -
48756241     -     -

The mount flags in fstab are:

ro,nfsv4,nosuid

The mount flags as reported by "nfsstat -m" are:

nfsv4,minorversion=0,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2147483647

Today, I managed to kill everything down to one user process that was
exhibiting this behavior.  After a kill -9 on that process, it went to
"REsJ" but continued to burn the same amount of CPU (all system).
Oddly the run state / wait channel was just "CPU1."  Running "ktrace"
did not produce any trace records.  Probably that is predictable for a
process in E state; if the process had crossed the user/kernel
boundary in a way ktrace could detect, it would have exited.

At that point, I started unmounting filesystems.  Everything but the
NFS filesystem used by that process unmounted cleanly.  The umount for
that filesystem went to D state for about a minute and then kicked
back "Device busy."  That's fair, if awfully slow.

Meanwhile, that user process continued burning system CPU with the E
flag set, not doing anything whatsoever in userspace, still producing
300+ "getattr fh 0,5/0" per second according to tcpdump and 0
according to nfsstat.

Eventually, I rebooted with fstab set back to nfsv3.

This feels like the user process is in a system call that is stuck in
an endless loop repeating some operation that generates that getattr
request.  But that is a feeling, not a fact.

This is fairly easy to reproduce; it seems pretty consistent within a
few hours (a day at most) any time I switch the relevant mounts to
nfsv4.  Reverting to nfsv3 makes this issue completely disappear.

What on earth could be going on here?  What other information can I
provide that would help track this down?

Thanks for any advice!