Re: git: 6468cd8e0ef9 - main - mount: add vnode usage per file system with mount -v

From: Cy Schubert <Cy.Schubert_at_cschubert.com>
Date: Wed, 15 Jun 2022 14:05:14 UTC
Can we revert this, please. It breaks NFSv4.


-- 
Cheers,
Cy Schubert <Cy.Schubert@cschubert.com>
FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
NTP:           <cy@nwtime.org>    Web:  https://nwtime.org

			e**(i*pi)+1=0


In message <20220615030833.79F9A9B@slippy.cwsent.com>, Cy Schubert writes:
> In message <CAGudoHHbf3PTt_TXnKAtt5n_WOYpvA0dCb9HAiDLAL+yR3VCFQ@mail.gmail.c
> om>
> , Mateusz Guzik writes:
> > On 6/13/22, Doug Ambrisko <ambrisko@ambrisko.com> wrote:
> > > On Mon, Jun 13, 2022 at 06:43:31PM +0200, Mateusz Guzik wrote:
> > > | On 6/13/22, Doug Ambrisko <ambrisko@freebsd.org> wrote:
> > > | > The branch main has been updated by ambrisko:
> > > | >
> > > | > URL:
> > > | >
> > > https://cgit.FreeBSD.org/src/commit/?id=6468cd8e0ef9d1d3331e9de26cd2be59b
> c7
> > 78494
> > > | >
> > > | > commit 6468cd8e0ef9d1d3331e9de26cd2be59bc778494
> > > | > Author:     Doug Ambrisko <ambrisko@FreeBSD.org>
> > > | > AuthorDate: 2022-06-13 14:56:38 +0000
> > > | > Commit:     Doug Ambrisko <ambrisko@FreeBSD.org>
> > > | > CommitDate: 2022-06-13 14:56:38 +0000
> > > | >
> > > | >     mount: add vnode usage per file system with mount -v
> > > | >
> > > | >     This avoids the need to drop into the ddb to figure out vnode
> > > | >     usage per file system.  It helps to see if they are or are not
> > > | >     being freed.  Suggestion to report active vnode count was from
> > > | >     kib@
> > > | >
> > > | >     Reviewed by:    kib
> > > | >     Differential Revision: https://reviews.freebsd.org/D35436
> > > | > ---
> > > | >  sbin/mount/mount.c   |  7 +++++++
> > > | >  sys/kern/vfs_mount.c | 12 ++++++++++++
> > > | >  sys/sys/mount.h      |  4 +++-
> > > | >  3 files changed, 22 insertions(+), 1 deletion(-)
> > > | >
> > > | > diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c
> > > | > index 79d9d6cb0caf..bd3d0073c474 100644
> > > | > --- a/sbin/mount/mount.c
> > > | > +++ b/sbin/mount/mount.c
> > > | > @@ -692,6 +692,13 @@ prmount(struct statfs *sfp)
> > > | >  			xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidbuf);
> > > | >  			free(fsidbuf);
> > > | >  		}
> > > | > +		if (sfp->f_nvnodelistsize != 0 || sfp->f_avnodecount !=
> >  0) {
> > > | > +			xo_open_container("vnodes");
> > > | > +			xo_emit("{D:,
> > > | > }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:active}{:active/%ju}",
> > > | > +			    (uintmax_t)sfp->f_nvnodelistsize,
> > > | > +			    (uintmax_t)sfp->f_avnodecount);
> > > | > +			xo_close_container("vnodes");
> > > | > +		}
> > > | >  	}
> > > | >  	xo_emit("{D:)}\n");
> > > | >  }
> > > | > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
> > > | > index 71a40fd97a9c..e3818b67e841 100644
> > > | > --- a/sys/kern/vfs_mount.c
> > > | > +++ b/sys/kern/vfs_mount.c
> > > | > @@ -2610,6 +2610,8 @@ vfs_copyopt(struct vfsoptlist *opts, const char
> > > *name,
> > > | > void *dest, int len)
> > > | >  int
> > > | >  __vfs_statfs(struct mount *mp, struct statfs *sbp)
> > > | >  {
> > > | > +	struct vnode *vp;
> > > | > +	uint32_t count;
> > > | >
> > > | >  	/*
> > > | >  	 * Filesystems only fill in part of the structure for updates, 
> > we
> > > | > @@ -2624,6 +2626,16 @@ __vfs_statfs(struct mount *mp, struct statfs
> > > *sbp)
> > > | >  	sbp->f_version = STATFS_VERSION;
> > > | >  	sbp->f_namemax = NAME_MAX;
> > > | >  	sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK;
> > > | > +	sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize;
> > > | > +
> > > | > +	count = 0;
> > > | > +	MNT_ILOCK(mp);
> > > | > +	TAILQ_FOREACH(vp, &mp->mnt_nvnodelist, v_nmntvnodes) {
> > > | > +		if (vrefcnt(vp) > 0) /* racy but does not matter */
> > > | > +			count++;
> > > | > +	}
> > > | > +	MNT_IUNLOCK(mp);
> > > | > +	sbp->f_avnodecount = count;
> > > | >
> > > |
> > > | libc uses statfs for dir walk (see gen/fts.c), most notably find
> > > | immediately runs into it. As such the linear scan by default is a
> > > | non-starter.
> > > |
> > > | I don't know if mount is the right place to dump this kind of info to
> > > | begin with, but even so, it should only happen with a dedicated flag.
> > > |
> > > | As statfs does not take any flags on its own, there is no way to
> > > | prevent it from doing the above walk. Perhaps a dedicated sysctl which
> > > | takes mount point id could do the walk instead, when asked.
> > > |
> > > | Short of making the walk optional I'm afraid this will have to be
> > > reverted.
> > >
> > > Just to be clear, this isn't breaking things but is not optimal for
> > > things that don't need this extra info.
> > >
> >
> > It's not "not optimal", it's a significant overhead which taxes
> > frequent users which don't benefit from it.
>
> Indeed this is not optimal. Since this revision NFSv4 performance has 
> tanked. An installworld over NFSv4 which used to take approximately 15 
> minutes took all night, not even finishing by morning. The NFS server, a 4 
> core machine, had a load average of 18 with three NFS clients attempting 
> installworld with nfsd using over 90% of the cycles on the machine. I was 
> able to reproduce the problem by running a series of tar cf /dev/null 
> /usr/obj in parallel, on a single NFSv4 client to essentially DoS the NFSv4 
> server.
>
> The workaround was to fall back to NFSv3, which was unaffected by this 
> revision.
>
> I reached out to our resident NFS person (rmacklem@) who suggested 
> reverting this revision, restoring the network to pre-regression state.
>
> >
> > For more data I plugged dtrace -n 'fbt::__vfs_statfs:entry {
> > @[execname] = count(); }' while package building, then i got tons of
> > hits:
> > [snip]
> >   expr                                                          13992
> >   install                                                       14090
> >   dirname                                                       14921
> >   mv                                                            17404
> >   ghc-stage1                                                    17577
> >   grep                                                          18998
> >   xgcc                                                          23832
> >   cpp                                                           29282
> >   cc1                                                           36961
> >   sh                                                            70575
> >   rm                                                            73904
> >   ld.lld                                                        87784
> >   sed                                                           88803
> >   c++                                                           98175
> >   cat                                                          115811
> >   cc                                                           449725
> >
> [...]
> > -- 
> > Mateusz Guzik <mjguzik gmail.com>
> >
>
>
> -- 
> Cheers,
> Cy Schubert <Cy.Schubert@cschubert.com>
> FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  http://www.FreeBSD.org
> NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
>
> 			e**(i*pi)+1=0
>
>