Re: 13-stable NFS server hang
- In reply to: Rick Macklem : "Re: 13-stable NFS server hang"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 04 Mar 2024 02:44:20 UTC
<<On Sun, 3 Mar 2024 13:17:30 -0800, Rick Macklem <rick.macklem@gmail.com> said:
>> [I wrote:]
>> (and so is dirty), this might take several seconds. I've set
>> vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
>> and am watching to see if we have more freezes.
>>
>> If this does the trick, then I can delay deploying a new kernel until
>> April, after my upcoming vacation.
> Interesting. Please let us know how it goes.
It's been about 22 hours since I flipped the sysctl and it hasn't
happened once yet. Of course I don't know what the users are up to
right now, so I'll continue to monitor.
This is the script I ended up with to monitor:
nfsstat -dW | awk 'BEGIN { n = 0 } (n == 0) && ($12 == 0) { n = n + 1; system("date"); next } (n > 0) && ($12 == 0) { system("date; procstat -k 1184 1198; netstat -n -p tcp"); exit(0) } { n = 0 }'
This should (if I haven't botched it) trigger only if two consecutive
seconds show no forward progress.
-GAWollman