[Bug 260664] ZFS/NFS: Intermittent hangs and crashes after a period of time in: nfscl_hasexpired || dbuf_write_done || zio_execute

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 27 Dec 2021 17:20:04 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260664

--- Comment #11 from Rick Macklem <rmacklem@FreeBSD.org> ---
Ok, it does look like the commit in stable/13 would fix this hang.
It happens because read calls nfscl_hasexpired(), which tries to
acquire the exclusive lock (similar to delegation return cases).

However, calling nfscl_hasexpired() should *almost never* happen.
It happens when the client has been partitioned from the NFSv4
server for at least a minute.
For the FreeBSD NFSv4 server (which is what is called a courteous
server), the expired only happens when a conflicting open/lock
request is done by another client or when open/lock resources
become exhausted.
When recovery from "expired" is done, all byte range locks are lost,
so getting the client/server into this state is to be avoiding if
at all possible.

If your NFSv4 server is a Linux one, expired will happen when the
client is network partitioned from the server for over 60sec.

In other words, I think you have some sort of network connectivity
problem to the NFS server.

As an alternative to upgrading to stable/13, you could switch to
using NFSv3 mounts to avoid the hang.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.