From nobody Wed Jun 15 16:09:12 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7959E85AC74; Wed, 15 Jun 2022 16:09:15 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta001.cacentral1.a.cloudfilter.net (omta001.cacentral1.a.cloudfilter.net [3.97.99.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LNVcC1XVXz4s1r; Wed, 15 Jun 2022 16:09:15 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from shw-obgw-4002a.ext.cloudfilter.net ([10.228.9.250]) by cmsmtp with ESMTP id 1QadoAFazwtwG1VZnokwPB; Wed, 15 Jun 2022 16:09:03 +0000 Received: from spqr.komquats.com ([70.66.148.124]) by cmsmtp with ESMTPA id 1VZwofh9eyz1u1VZxomuDn; Wed, 15 Jun 2022 16:09:14 +0000 X-Authority-Analysis: v=2.4 cv=J4G5USrS c=1 sm=1 tr=0 ts=62aa042a a=Cwc3rblV8FOMdVN/wOAqyQ==:117 a=Cwc3rblV8FOMdVN/wOAqyQ==:17 a=kj9zAlcOel0A:10 a=JPEYwPQDsx4A:10 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8 a=EkcXrb_YAAAA:8 a=XW9rUjSBAAAA:8 a=VxmjJ2MpAAAA:8 a=pGLkceISAAAA:8 a=pYWIVftDJbSjoaxpCOgA:9 a=CjuIK1q_8ugA:10 a=HnUiHVg32c4A:10 a=OkXUG-eLNbkA:10 a=Ia-lj3WSrqcvXOmTRaiG:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 a=s9rocZTD7jChZwxfYMT6:22 a=7gXAzLPJhVmCkEl4_tsf:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id A5AA810B; Wed, 15 Jun 2022 09:09:12 -0700 (PDT) Received: by slippy.cwsent.com (Postfix, from userid 1000) id 6E86A202; Wed, 15 Jun 2022 09:09:12 -0700 (PDT) X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.7+dev Reply-to: Cy Schubert From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.cschubert.com/ To: Doug Ambrisko cc: Cy Schubert , Mateusz Guzik , Doug Ambrisko , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org, Rick Macklem Subject: Re: git: 6468cd8e0ef9 - main - mount: add vnode usage per file system with mount -v In-reply-to: References: <202206131457.25DEvJDU044469@gitrepo.freebsd.org> <20220615030833.79F9A9B@slippy.cwsent.com> <20220615140514.77BEDD7@slippy.cwsent.com> Comments: In-reply-to Doug Ambrisko message dated "Wed, 15 Jun 2022 08:07:52 -0700." List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 15 Jun 2022 09:09:12 -0700 Message-Id: <20220615160912.6E86A202@slippy.cwsent.com> X-CMAE-Envelope: MS4xfH1TUqPaRa3G2rWijf9mYEFlc9+QLOHeFLtK6aLscmgNd3p21/hjNnwRlRaKGsWvvV2v5gjkHPRrfUMDH0vM/BsmYahQPaRgfuOCfYBWXpa1Lh48QR8N jIufka84sWBxOAs5XSpIP+b13hEPb4C0kTDqgyvyIBO7PHKInLrm9vgVGyHssv6hraWal3lwp5SE2DuS9aW8rh35OBGMSydYId8eUVA+0lq3CyDT4PowDipc 5YJYyfjCJgezr+uBT4zrXHgqrgMUkPrpeHB+0BUnWKKnUCFp8sf19jcl/68A4H5xrBZU4yJQphCxmJ7BP+/3UfcDZ7DfsrBUG3R8keKGoOox0VZ11T7UYm/m E/qiVQbhnECOs6LUrWZjytoFCirQTqA01SnOZW8t9/NDSaMN27SBTtt8rj6ZOZQ+lwSt+CULxl4LHo0iT6HybrkvCuLCKw== X-Rspamd-Queue-Id: 4LNVcC1XVXz4s1r X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Thanks. This fixes it. 16 simultaneous tar cf /dev/null /usr/obj on three separate NFS clients: Server: Clients OpenOwner Opens LockOwner Locks Delegs 3 48 54 0 0 0 Layouts 0 The load on the NFS server is ~ 0.5 to 2 on a four core machine with CPU sys busy between 2% and 30%. Prior to this CPU sys was pegged at close to 100% with a load of 18. The regression has been resolved. Thanks. -- Cheers, Cy Schubert FreeBSD UNIX: Web: http://www.FreeBSD.org NTP: Web: https://nwtime.org e**(i*pi)+1=0 In message , Doug Ambrisko writes: > On Wed, Jun 15, 2022 at 07:23:51AM -0700, Doug Ambrisko wrote: > | On Wed, Jun 15, 2022 at 07:05:14AM -0700, Cy Schubert wrote: > | | Can we revert this, please. It breaks NFSv4. > | > | It would be nice if you could try the proposed partial revert. > | I'm planning to commit that shortly. > > It is in ce00b11940ab. > > Please let me know how that works. > > Thanks, > > Doug A. > | | In message <20220615030833.79F9A9B@slippy.cwsent.com>, Cy Schubert writes > : > | | > In message ail.c > | | > om> > | | > , Mateusz Guzik writes: > | | > > On 6/13/22, Doug Ambrisko wrote: > | | > > > On Mon, Jun 13, 2022 at 06:43:31PM +0200, Mateusz Guzik wrote: > | | > > > | On 6/13/22, Doug Ambrisko wrote: > | | > > > | > The branch main has been updated by ambrisko: > | | > > > | > > | | > > > | > URL: > | | > > > | > > | | > > > https://cgit.FreeBSD.org/src/commit/?id=6468cd8e0ef9d1d3331e9de26cd > 2be59b > | | > c7 > | | > > 78494 > | | > > > | > > | | > > > | > commit 6468cd8e0ef9d1d3331e9de26cd2be59bc778494 > | | > > > | > Author: Doug Ambrisko > | | > > > | > AuthorDate: 2022-06-13 14:56:38 +0000 > | | > > > | > Commit: Doug Ambrisko > | | > > > | > CommitDate: 2022-06-13 14:56:38 +0000 > | | > > > | > > | | > > > | > mount: add vnode usage per file system with mount -v > | | > > > | > > | | > > > | > This avoids the need to drop into the ddb to figure out vno > de > | | > > > | > usage per file system. It helps to see if they are or are > not > | | > > > | > being freed. Suggestion to report active vnode count was f > rom > | | > > > | > kib@ > | | > > > | > > | | > > > | > Reviewed by: kib > | | > > > | > Differential Revision: https://reviews.freebsd.org/D35436 > | | > > > | > --- > | | > > > | > sbin/mount/mount.c | 7 +++++++ > | | > > > | > sys/kern/vfs_mount.c | 12 ++++++++++++ > | | > > > | > sys/sys/mount.h | 4 +++- > | | > > > | > 3 files changed, 22 insertions(+), 1 deletion(-) > | | > > > | > > | | > > > | > diff --git a/sbin/mount/mount.c b/sbin/mount/mount.c > | | > > > | > index 79d9d6cb0caf..bd3d0073c474 100644 > | | > > > | > --- a/sbin/mount/mount.c > | | > > > | > +++ b/sbin/mount/mount.c > | | > > > | > @@ -692,6 +692,13 @@ prmount(struct statfs *sfp) > | | > > > | > xo_emit("{D:, }{Lw:fsid}{:fsid}", fsidb > uf); > | | > > > | > free(fsidbuf); > | | > > > | > } > | | > > > | > + if (sfp->f_nvnodelistsize != 0 || sfp->f_avnode > count != > | | > > 0) { > | | > > > | > + xo_open_container("vnodes"); > | | > > > | > + xo_emit("{D:, > | | > > > | > }{Lwc:vnodes}{Lw:count}{w:count/%ju}{Lw:active}{:active/%ju}", > | | > > > | > + (uintmax_t)sfp->f_nvnodelistsize, > | | > > > | > + (uintmax_t)sfp->f_avnodecount); > | | > > > | > + xo_close_container("vnodes"); > | | > > > | > + } > | | > > > | > } > | | > > > | > xo_emit("{D:)}\n"); > | | > > > | > } > | | > > > | > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c > | | > > > | > index 71a40fd97a9c..e3818b67e841 100644 > | | > > > | > --- a/sys/kern/vfs_mount.c > | | > > > | > +++ b/sys/kern/vfs_mount.c > | | > > > | > @@ -2610,6 +2610,8 @@ vfs_copyopt(struct vfsoptlist *opts, cons > t char > | | > > > *name, > | | > > > | > void *dest, int len) > | | > > > | > int > | | > > > | > __vfs_statfs(struct mount *mp, struct statfs *sbp) > | | > > > | > { > | | > > > | > + struct vnode *vp; > | | > > > | > + uint32_t count; > | | > > > | > > | | > > > | > /* > | | > > > | > * Filesystems only fill in part of the structure for u > pdates, > | | > > we > | | > > > | > @@ -2624,6 +2626,16 @@ __vfs_statfs(struct mount *mp, struct st > atfs > | | > > > *sbp) > | | > > > | > sbp->f_version = STATFS_VERSION; > | | > > > | > sbp->f_namemax = NAME_MAX; > | | > > > | > sbp->f_flags = mp->mnt_flag & MNT_VISFLAGMASK; > | | > > > | > + sbp->f_nvnodelistsize = mp->mnt_nvnodelistsize; > | | > > > | > + > | | > > > | > + count = 0; > | | > > > | > + MNT_ILOCK(mp); > | | > > > | > + TAILQ_FOREACH(vp, &mp->mnt_nvnodelist, v_nmntvnodes) { > | | > > > | > + if (vrefcnt(vp) > 0) /* racy but does not matte > r */ > | | > > > | > + count++; > | | > > > | > + } > | | > > > | > + MNT_IUNLOCK(mp); > | | > > > | > + sbp->f_avnodecount = count; > | | > > > | > > | | > > > | > | | > > > | libc uses statfs for dir walk (see gen/fts.c), most notably find > | | > > > | immediately runs into it. As such the linear scan by default is a > | | > > > | non-starter. > | | > > > | > | | > > > | I don't know if mount is the right place to dump this kind of inf > o to > | | > > > | begin with, but even so, it should only happen with a dedicated f > lag. > | | > > > | > | | > > > | As statfs does not take any flags on its own, there is no way to > | | > > > | prevent it from doing the above walk. Perhaps a dedicated sysctl > which > | | > > > | takes mount point id could do the walk instead, when asked. > | | > > > | > | | > > > | Short of making the walk optional I'm afraid this will have to be > | | > > > reverted. > | | > > > > | | > > > Just to be clear, this isn't breaking things but is not optimal for > | | > > > things that don't need this extra info. > | | > > > > | | > > > | | > > It's not "not optimal", it's a significant overhead which taxes > | | > > frequent users which don't benefit from it. > | | > > | | > Indeed this is not optimal. Since this revision NFSv4 performance has > | | > tanked. An installworld over NFSv4 which used to take approximately 15 > | | > minutes took all night, not even finishing by morning. The NFS server, > a 4 > | | > core machine, had a load average of 18 with three NFS clients attemptin > g > | | > installworld with nfsd using over 90% of the cycles on the machine. I w > as > | | > able to reproduce the problem by running a series of tar cf /dev/null > | | > /usr/obj in parallel, on a single NFSv4 client to essentially DoS the N > FSv4 > | | > server. > | | > > | | > The workaround was to fall back to NFSv3, which was unaffected by this > | | > revision. > | | > > | | > I reached out to our resident NFS person (rmacklem@) who suggested > | | > reverting this revision, restoring the network to pre-regression state. > | | > > | | > > > | | > > For more data I plugged dtrace -n 'fbt::__vfs_statfs:entry { > | | > > @[execname] = count(); }' while package building, then i got tons of > | | > > hits: > | | > > [snip] > | | > > expr 13992 > | | > > install 14090 > | | > > dirname 14921 > | | > > mv 17404 > | | > > ghc-stage1 17577 > | | > > grep 18998 > | | > > xgcc 23832 > | | > > cpp 29282 > | | > > cc1 36961 > | | > > sh 70575 > | | > > rm 73904 > | | > > ld.lld 87784 > | | > > sed 88803 > | | > > c++ 98175 > | | > > cat 115811 > | | > > cc 449725 > | | > > > | | > [...] > | | > > -- > | | > > Mateusz Guzik > | | > > > | | > > | | > > | | > -- > | | > Cheers, > | | > Cy Schubert > | | > FreeBSD UNIX: Web: http://www.FreeBSD.org > | | > NTP: Web: https://nwtime.org > | | > > | | > e**(i*pi)+1=0 > | | > > | | > > | |