Re: git: 6468cd8e0ef9 - main - mount: add vnode usage per file system with mount -v
Date: Wed, 15 Jun 2022 03:08:33 UTC
In message <CAGudoHHbf3PTt_TXnKAtt5n_WOYpvA0dCb9HAiDLAL+yR3VCFQ@mail.gmail.c om> , Mateusz Guzik writes: > On 6/13/22, Doug Ambrisko <ambrisko@ambrisko.com> wrote: > > On Mon, Jun 13, 2022 at 06:43:31PM +0200, Mateusz Guzik wrote: > > | On 6/13/22, Doug Ambrisko <ambrisko@freebsd.org> wrote: > > | > The branch main has been updated by ambrisko: > > | > > > | > URL: > > | > > > https://cgit.FreeBSD.org/src/commit/?id=6468cd8e0ef9d1d3331e9de26cd2be59bc7 > 78494 > > | > > > | > commit 6468cd8e0ef9d1d3331e9de26cd2be59bc778494 > > | > Author: Doug Ambrisko <ambrisko@FreeBSD.org> > > | > AuthorDate: 2022-06-13 14:56:38 +0000 > > | > Commit: Doug Ambrisko <ambrisko@FreeBSD.org> > > | > CommitDate: 2022-06-13 14:56:38 +0000 > > | > > > | > mount: add vnode usage per file system with mount -v > > | > > > | > This avoids the need to drop into the ddb to figure out vnode > > | > usage per file system. It helps to see if they are or are not > > | > being freed. Suggestion to report active vnode count was from > > | > kib@ > > | > > > | > Reviewed by: kib > > | > Differential Revision: https://reviews.freebsd.org/D35436 > > | > --- > > | > sbin/mount/mount.c | 7 +++++++ > > | > sys/kern/vfs_mount.c | 12 ++++++++++++ > > | > sys/sys/mount.h | 4 +++- > > | > 3 files changed, 22 insertions(+), 1 deletion(-) > > | > > > | > diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c > > | > index 79d9d6cb0caf..bd3d0073c474 100644 > > | > --- a/sbin/mount/mount.c > > | > +++ b/sbin/mount/mount.c > > | > @@ -692,6 +692,13 @@ prmount(struct statfs *sfp) > > | > xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidbuf); > > | > free(fsidbuf); > > | > } > > | > + if (sfp->f_nvnodelistsize != 0 || sfp->f_avnodecount != > 0) { > > | > + xo_open_container("vnodes"); > > | > + xo_emit("{D:, > > | > }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:active}{:active/%ju}", > > | > + (uintmax_t)sfp->f_nvnodelistsize, > > | > + (uintmax_t)sfp->f_avnodecount); > > | > + xo_close_container("vnodes"); > > | > + } > > | > } > > | > xo_emit("{D:)}\n"); > > | > } > > | > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c > > | > index 71a40fd97a9c..e3818b67e841 100644 > > | > --- a/sys/kern/vfs_mount.c > > | > +++ b/sys/kern/vfs_mount.c > > | > @@ -2610,6 +2610,8 @@ vfs_copyopt(struct vfsoptlist *opts, const char > > *name, > > | > void *dest, int len) > > | > int > > | > __vfs_statfs(struct mount *mp, struct statfs *sbp) > > | > { > > | > + struct vnode *vp; > > | > + uint32_t count; > > | > > > | > /* > > | > * Filesystems only fill in part of the structure for updates, > we > > | > @@ -2624,6 +2626,16 @@ __vfs_statfs(struct mount *mp, struct statfs > > *sbp) > > | > sbp->f_version = STATFS_VERSION; > > | > sbp->f_namemax = NAME_MAX; > > | > sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; > > | > + sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize; > > | > + > > | > + count = 0; > > | > + MNT_ILOCK(mp); > > | > + TAILQ_FOREACH(vp, &mp->mnt_nvnodelist, v_nmntvnodes) { > > | > + if (vrefcnt(vp) > 0) /* racy but does not matter */ > > | > + count++; > > | > + } > > | > + MNT_IUNLOCK(mp); > > | > + sbp->f_avnodecount = count; > > | > > > | > > | libc uses statfs for dir walk (see gen/fts.c), most notably find > > | immediately runs into it. As such the linear scan by default is a > > | non-starter. > > | > > | I don't know if mount is the right place to dump this kind of info to > > | begin with, but even so, it should only happen with a dedicated flag. > > | > > | As statfs does not take any flags on its own, there is no way to > > | prevent it from doing the above walk. Perhaps a dedicated sysctl which > > | takes mount point id could do the walk instead, when asked. > > | > > | Short of making the walk optional I'm afraid this will have to be > > reverted. > > > > Just to be clear, this isn't breaking things but is not optimal for > > things that don't need this extra info. > > > > It's not "not optimal", it's a significant overhead which taxes > frequent users which don't benefit from it. Indeed this is not optimal. Since this revision NFSv4 performance has tanked. An installworld over NFSv4 which used to take approximately 15 minutes took all night, not even finishing by morning. The NFS server, a 4 core machine, had a load average of 18 with three NFS clients attempting installworld with nfsd using over 90% of the cycles on the machine. I was able to reproduce the problem by running a series of tar cf /dev/null /usr/obj in parallel, on a single NFSv4 client to essentially DoS the NFSv4 server. The workaround was to fall back to NFSv3, which was unaffected by this revision. I reached out to our resident NFS person (rmacklem@) who suggested reverting this revision, restoring the network to pre-regression state. > > For more data I plugged dtrace -n 'fbt::__vfs_statfs:entry { > @[execname] = count(); }' while package building, then i got tons of > hits: > [snip] > expr 13992 > install 14090 > dirname 14921 > mv 17404 > ghc-stage1 17577 > grep 18998 > xgcc 23832 > cpp 29282 > cc1 36961 > sh 70575 > rm 73904 > ld.lld 87784 > sed 88803 > c++ 98175 > cat 115811 > cc 449725 > [...] > -- > Mateusz Guzik <mjguzik gmail.com> > -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: http://www.FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e**(i*pi)+1=0