Re: NFSv4 client hung

In reply to: Alexandre Biancalana : "Re: NFSv4 client hung"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Tue, 02 Sep 2025 21:19:32 UTC
On Tue, Sep 2, 2025 at 9:46 AM Alexandre Biancalana
<biancalana@gmail.com> wrote:
>
> On Tue, Sep 2, 2025 at 11:00 AM Rick Macklem <rick.macklem@gmail.com> wrote:
> >
> > On Tue, Sep 2, 2025 at 6:01 AM Alexandre Biancalana
> > <biancalana@gmail.com> wrote:
> > >
> > > Hi Rick! Thank you for the answer.
> > >
> > > I also think that it has nothing to do with the server because there’s other nfs client (also running vms with bhyve) that keeps running.
> > >
> > > To make sure that I understood, in my setup the nfs client is a physical host that mount nfs share with vms disks. Then I run those vms with bhyve, that vms does not mount any nfs share.
> > > The hang happens when I try to access the nfs mounted shares in the physical host and (i think) as consequences the vms also freeze when trying to do io.
> > >
> > > Your suggestion is to increase the amount of memory of the vms ?
> > Oops, yes, the buffer cache problem would be on the physical system,
> > given that is where the mount is done.
> >
> > >
> > > For educational purposes, can you point me the code part that uses newbuf so i can try to learn something?
> > sys/kern/vfs_bio.c
>
> Thanks I'm reading !
>
> >
> > There are some sysctls you can look at. You'll get them by:
> > # sysctl -a | fgrep vfs | fgrep buffer
> root@bhyve01:~ # sysctl -a | fgrep vfs | fgrep buffer
> vfs.hifreebuffers: 5376
> vfs.lofreebuffers: 3584
> vfs.numfreebuffers: 105931
> vfs.hidirtybuffers: 26502
> vfs.lodirtybuffers: 13251
> vfs.numdirtybuffers: 220
> vfs.altbufferflushes: 0
> vfs.dirtybufferflushes: 0
>
> > # sysctl -a | fgrep bufspace
> root@bhyve01:~ # sysctl -a | fgrep bufspace
> vfs.bufspacethresh: 1681960548
> vfs.hibufspace: 1725087744
> vfs.lobufspace: 1638833353
> vfs.maxmallocbufspace: 86254387
> vfs.maxbufspace: 1735573504
> vfs.bufspace: 502060032
> vfs.runningbufspace: 0
>
>
> > - Some of these can be adjusted. If you look in sys/kern/vfs_bio.c,
> >   you can see which ones are CTLFLAG_RW.
>
> I've instrummented a collection of those values each 10s storing in a
> tsdb (/usr/sbin/prometheus_sysctl_exporter| grep vfs | grep buf), so
> we can track the values overtime.
> I still haven't got the mechanism, but what I think that makes sense
> to measure/watch is:
>
>  - runningbufspace: it the number of outstanding requests grow a lot,
> can be a signal of stall
>  - bufkvaspace
>  - bufspace/maxbufspace = total usage of bufspace
>  - bufmallocspace/maxmallocbufspace = total usage of malloced memory for buffers
>  - bdwriteskip
>  - numdirtybuffers: data not persisted to backing store
>  - numfreebuffers
>  - lofreebuffers
>  - getnewbufrestarts
>  - mappingrestarts
>  - numbufallocfails
>  - notbufdflushes
>
> >
> > Also, you can see exactly what the NFS mount setup is by:
> > # nfsstat -m
> > - If you post the output from this, I might be able to suggest
> >   some mount option changes.
>
> As I said, I have two machines, they had the same config. When I
> started to have the problem I removed all the tuning and rolled back
> to nfsv3 in bhyve01. Sadly bhyve01 still hangs from time to time. I'm
> going to share nfsstat from both machines.
>
> root@bhyve01:~ # nfsstat -m
> 10.10.10.10:/mnt/datastore0/bhyve_instances on /vms
> nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2
> 10.10.10.10:/mnt/datastore1/iso on /vms/.iso
> nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2
> 10.10.10.10:/mnt/ds_ssd_vms_03/disks on /vms/.disks
> nfsv3,tcp,resvport,nconnect=1,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=1,wcommitsize=16777216,timeout=120,retrans=2
>
> root@bhyve02:~ # nfsstat -m
> 10.10.10.10:/mnt/datastore0/bhyve_instances on /vms
> nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647
> 10.10.10.10:/mnt/datastore1/iso on /vms/.iso
> nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647
> 10.10.10.10:/mnt/ds_ssd_vms_03/disks on /vms/.disks
> nfsv4,minorversion=2,tcp,resvport,nconnect=16,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=67108864,timeout=120,retrans=2147483647

The only mount option I can see that might be worth
fiddling with is "wcommitsize". It is "how much can be cached before a commit
is done". The buffer cannot by re-used until it is commited, so you might try
making it smaller?

rick

>
>
> >
> > I do not know how bhyve reads/writes the image file?
> > (That might be a hint as well, since that is probably
> > what is unique about your setup.)
> >
> > rick
> >
> > >
> > > Ale
> > >
> > > On Mon, 1 Sep 2025 at 22:49 Rick Macklem <rick.macklem@gmail.com> wrote:
> > >>
> > >> For some reason, I cannot reply to your email
> > >> (might be the size of it), so I'll post a simple
> > >> comment.
> > >>
> > >> As you noted, processed are stuck on newbuf in
> > >> the client. This probably has nothing to do with
> > >> the server. It also looks like the clients are bhyve.
> > >>
> > >> Bump the memory size of the bhyve clients up,
> > >> maybe way up.
> > >> --> There are ways to tune the size of the buffer
> > >>       cache, but bumping up the VM's ram should
> > >>       give you more.
> > >>
> > >> rick