Re: NFSv4 client hung

From: Alexandre Biancalana <biancalana_at_gmail.com>
Date: Tue, 02 Sep 2025 16:46:14 UTC
On Tue, Sep 2, 2025 at 11:00 AM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Tue, Sep 2, 2025 at 6:01 AM Alexandre Biancalana
> <biancalana@gmail.com> wrote:
> >
> > Hi Rick! Thank you for the answer.
> >
> > I also think that it has nothing to do with the server because there’s other nfs client (also running vms with bhyve) that keeps running.
> >
> > To make sure that I understood, in my setup the nfs client is a physical host that mount nfs share with vms disks. Then I run those vms with bhyve, that vms does not mount any nfs share.
> > The hang happens when I try to access the nfs mounted shares in the physical host and (i think) as consequences the vms also freeze when trying to do io.
> >
> > Your suggestion is to increase the amount of memory of the vms ?
> Oops, yes, the buffer cache problem would be on the physical system,
> given that is where the mount is done.
>
> >
> > For educational purposes, can you point me the code part that uses newbuf so i can try to learn something?
> sys/kern/vfs_bio.c

Thanks I'm reading !

>
> There are some sysctls you can look at. You'll get them by:
> # sysctl -a | fgrep vfs | fgrep buffer
root@bhyve01:~ # sysctl -a | fgrep vfs | fgrep buffer
vfs.hifreebuffers: 5376
vfs.lofreebuffers: 3584
vfs.numfreebuffers: 105931
vfs.hidirtybuffers: 26502
vfs.lodirtybuffers: 13251
vfs.numdirtybuffers: 220
vfs.altbufferflushes: 0
vfs.dirtybufferflushes: 0

> # sysctl -a | fgrep bufspace
root@bhyve01:~ # sysctl -a | fgrep bufspace
vfs.bufspacethresh: 1681960548
vfs.hibufspace: 1725087744
vfs.lobufspace: 1638833353
vfs.maxmallocbufspace: 86254387
vfs.maxbufspace: 1735573504
vfs.bufspace: 502060032
vfs.runningbufspace: 0


> - Some of these can be adjusted. If you look in sys/kern/vfs_bio.c,
>   you can see which ones are CTLFLAG_RW.

I've instrummented a collection of those values each 10s storing in a
tsdb (/usr/sbin/prometheus_sysctl_exporter| grep vfs | grep buf), so
we can track the values overtime.
I still haven't got the mechanism, but what I think that makes sense
to measure/watch is:

 - runningbufspace: it the number of outstanding requests grow a lot,
can be a signal of stall
 - bufkvaspace
 - bufspace/maxbufspace = total usage of bufspace
 - bufmallocspace/maxmallocbufspace = total usage of malloced memory for buffers
 - bdwriteskip
 - numdirtybuffers: data not persisted to backing store
 - numfreebuffers
 - lofreebuffers
 - getnewbufrestarts
 - mappingrestarts
 - numbufallocfails
 - notbufdflushes

>
> Also, you can see exactly what the NFS mount setup is by:
> # nfsstat -m
> - If you post the output from this, I might be able to suggest
>   some mount option changes.

As I said, I have two machines, they had the same config. When I
started to have the problem I removed all the tuning and rolled back
to nfsv3 in bhyve01. Sadly bhyve01 still hangs from time to time. I'm
going to share nfsstat from both machines.

root@bhyve01:~ # nfsstat -m
10.10.10.10:/mnt/datastore0/bhyve_instances on /vms
nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2
10.10.10.10:/mnt/datastore1/iso on /vms/.iso
nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2
10.10.10.10:/mnt/ds_ssd_vms_03/disks on /vms/.disks
nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2

root@bhyve02:~ # nfsstat -m
10.10.10.10:/mnt/datastore0/bhyve_instances on /vms
nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647
10.10.10.10:/mnt/datastore1/iso on /vms/.iso
nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647
10.10.10.10:/mnt/ds_ssd_vms_03/disks on /vms/.disks
nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647


>
> I do not know how bhyve reads/writes the image file?
> (That might be a hint as well, since that is probably
> what is unique about your setup.)
>
> rick
>
> >
> > Ale
> >
> > On Mon, 1 Sep 2025 at 22:49 Rick Macklem <rick.macklem@gmail.com> wrote:
> >>
> >> For some reason, I cannot reply to your email
> >> (might be the size of it), so I'll post a simple
> >> comment.
> >>
> >> As you noted, processed are stuck on newbuf in
> >> the client. This probably has nothing to do with
> >> the server. It also looks like the clients are bhyve.
> >>
> >> Bump the memory size of the bhyve clients up,
> >> maybe way up.
> >> --> There are ways to tune the size of the buffer
> >>       cache, but bumping up the VM's ram should
> >>       give you more.
> >>
> >> rick