From nobody Fri Sep 15 10:09:29 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Rn91N5MFJz4shjf for ; Fri, 15 Sep 2023 10:10:32 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Received: from mailgate.Leidinger.net (mailgate.leidinger.net [IPv6:2a00:1828:2000:313::1:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature ECDSA (P-256) client-digest SHA256) (Client CN "mailgate.leidinger.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Rn91M3FrQz4kX0 for ; Fri, 15 Sep 2023 10:10:31 +0000 (UTC) (envelope-from Alexander@Leidinger.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=leidinger.net header.s=outgoing-alex header.b="X/QwiLlz"; spf=pass (mx1.freebsd.org: domain of Alexander@Leidinger.net designates 2a00:1828:2000:313::1:5 as permitted sender) smtp.mailfrom=Alexander@Leidinger.net; dmarc=pass (policy=quarantine) header.from=leidinger.net List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leidinger.net; s=outgoing-alex; t=1694772618; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=YY4vdbKi+wnZcGrHjOF2idrZ/rWxkDKcTqGgs1JsmDg=; b=X/QwiLlzbm6j/Y2dyRUEIzUBnj46QCbkNjCaAURXifpfhf0fgRtKKCgE+1PCQX2BgkCX3b 77uHqBOlgPO2zQOT5SLXBIkPzspnGAQmx0iMYDv3rE1SGvR4rqomSXGXcTNuautkNqDLBU XwehCUCEPRRszTS4T87qcYjukq35dguTxP0kxfCBt9NZHumwiwkVtmWAS1ffMGBfPxuNk3 iEjTsee0azy52kYfXaC6pjNQyGd/XzJ3Ozmt9oHkslXwYdtii5KW1bKLYBlyMXcFIAa1NL uhKcp9zyVYV9QBFBbTWP1mQhO8VfRYN1kARfR1lOD6Io2v+dFYjBKndW7gi0PQ== Date: Fri, 15 Sep 2023 12:09:29 +0200 From: Alexander Leidinger To: Mateusz Guzik Cc: Konstantin Belousov , current@freebsd.org Subject: Re: Speed improvements in ZFS In-Reply-To: References: <88e837aeb5a65c1f001de2077fb7bcbd@Leidinger.net> <4d60bd12b482e020fd4b186a9ec1a250@Leidinger.net> <73f7c9d3db8f117deb077fb17b1e352a@Leidinger.net> <58493b568dbe9fb52cc55de86e01f5e2@Leidinger.net> <58ac6211235c52d744666e8ae2ec7568@Leidinger.net> <444770b977b02b98985928bea450e4ce@Leidinger.net> <076f09cc0b99643072d8b80a6ec5b03b@Leidinger.net> <1d0d37f27e4898f1604c6ddc6ad3e831@Leidinger.net> Message-ID: X-Sender: Alexander@Leidinger.net Organization: No organization, this is a private message. Content-Type: multipart/signed; protocol="application/pgp-signature"; boundary="=_b9e4cb8aef91aebe3093101cb46b82bc"; micalg=pgp-sha256 X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.07 / 15.00]; SIGNED_PGP(-2.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.972]; DMARC_POLICY_ALLOW(-0.50)[leidinger.net,quarantine]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; R_DKIM_ALLOW(-0.20)[leidinger.net:s=outgoing-alex]; R_SPF_ALLOW(-0.20)[+mx]; FREEMAIL_TO(0.00)[gmail.com]; MLMMJ_DEST(0.00)[current@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:34240, ipnet:2a00:1828::/32, country:DE]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; RCPT_COUNT_THREE(0.00)[3]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[leidinger.net:+]; TO_DN_SOME(0.00)[]; HAS_ORG_HEADER(0.00)[]; HAS_ATTACHMENT(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4Rn91M3FrQz4kX0 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_b9e4cb8aef91aebe3093101cb46b82bc Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Am 2023-09-04 14:26, schrieb Mateusz Guzik: > On 9/4/23, Alexander Leidinger wrote: >> Am 2023-08-28 22:33, schrieb Alexander Leidinger: >>> Am 2023-08-22 18:59, schrieb Mateusz Guzik: >>>> On 8/22/23, Alexander Leidinger wrote: >>>>> Am 2023-08-21 10:53, schrieb Konstantin Belousov: >>>>>> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger >>>>>> wrote: >>>>>>> Am 2023-08-20 23:17, schrieb Konstantin Belousov: >>>>>>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: >>>>>>> > > On 8/20/23, Alexander Leidinger wrote: >>>>>>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: >>>>>>> > > >> On 8/20/23, Alexander Leidinger >>>>>>> > > >> wrote: >>>>>>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: >>>>>>> > > >>>> On 8/18/23, Alexander Leidinger >>>>>>> > > >>>> wrote: >>>>>>> > > >>> >>>>>>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you >>>>>>> > > >>>>> interested >>>>>>> > > >>>>> to >>>>>>> > > >>>>> get it? >>>>>>> > > >>>>> >>>>>>> > > >>>> >>>>>>> > > >>>> Your problem is not the vnode limit, but nullfs. >>>>>>> > > >>>> >>>>>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg >>>>>>> > > >>> >>>>>>> > > >>> 122 nullfs mounts on this system. And every jail I setup has >>>>>>> > > >>> several >>>>>>> > > >>> null mounts. One basesystem mounted into every jail, and then >>>>>>> > > >>> shared >>>>>>> > > >>> ports (packages/distfiles/ccache) across all of them. >>>>>>> > > >>> >>>>>>> > > >>>> First, some of the contention is notorious VI_LOCK in order >>>>>>> > > >>>> to >>>>>>> > > >>>> do >>>>>>> > > >>>> anything. >>>>>>> > > >>>> >>>>>>> > > >>>> But more importantly the mind-boggling off-cpu time comes >>>>>>> > > >>>> from >>>>>>> > > >>>> exclusive locking which should not be there to begin with -- >>>>>>> > > >>>> as >>>>>>> > > >>>> in >>>>>>> > > >>>> that xlock in stat should be a slock. >>>>>>> > > >>>> >>>>>>> > > >>>> Maybe I'm going to look into it later. >>>>>>> > > >>> >>>>>>> > > >>> That would be fantastic. >>>>>>> > > >>> >>>>>>> > > >> >>>>>>> > > >> I did a quick test, things are shared locked as expected. >>>>>>> > > >> >>>>>>> > > >> However, I found the following: >>>>>>> > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { >>>>>>> > > >> mp->mnt_kern_flag |= >>>>>>> > > >> lowerrootvp->v_mount->mnt_kern_flag & >>>>>>> > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | >>>>>>> > > >> MNTK_EXTENDED_SHARED); >>>>>>> > > >> } >>>>>>> > > >> >>>>>>> > > >> are you using the "nocache" option? it has a side effect of >>>>>>> > > >> xlocking >>>>>>> > > > >>>>>>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. >>>>>>> > > > >>>>>>> > > >>>>>>> > > If you don't have "nocache" on null mounts, then I don't see how >>>>>>> > > this >>>>>>> > > could happen. >>>>>>> > >>>>>>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set >>>>>>> > for >>>>>>> > fuse and nfs at least. >>>>>>> >>>>>>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS >>>>>>> exported. >>>>>>> 6 of those nullfs mounts are also exported via Samba. The NFS >>>>>>> exports >>>>>>> shouldn't be needed anymore, I will remove them. >>>>>> By nfs I meant nfs client, not nfs exports. >>>>> >>>>> No NFS client mounts anywhere on this system. So where is this >>>>> exclusive >>>>> lock coming from then... >>>>> This is a ZFS system. 2 pools: one for the root, one for anything I >>>>> need >>>>> space for. Both pools reside on the same disks. The root pool is a >>>>> 3-way >>>>> mirror, the "space-pool" is a 5-disk raidz2. All jails are on the >>>>> space-pool. The jails are all basejail-style jails. >>>>> >>>> >>>> While I don't see why xlocking happens, you should be able to dtrace >>>> or printf your way into finding out. >>> >>> dtrace looks to me like a faster approach to get to the root than >>> printf... my first naive try is to detect exclusive locks. I'm not >>> 100% >>> sure I got it right, but at least dtrace doesn't complain about it: >>> ---snip--- >>> #pragma D option dynvarsize=32m >>> >>> fbt:nullfs:null_lock:entry >>> /args[0]->a_flags & 0x080000 != 0/ >>> { >>> stack(); >>> } >>> ---snip--- >>> >>> In which direction should I look with dtrace if this works in >>> tonights >>> run of periodic? I don't have enough knowledge about VFS to come up >>> with some immediate ideas. >> >> After your sysctl fix for maxvnodes I increased the amount of vnodes >> 10 >> times compared to the initial report. This has increased the speed of >> the operation, the find runs in all those jails finished today after >> ~5h >> (@~8am) instead of in the afternoon as before. Could this suggest that >> in parallel some null_reclaim() is running which does the exclusive >> locks and slows down the entire operation? >> > > That may be a slowdown to some extent, but the primary problem is > exclusive vnode locking for stat lookup, which should not be > happening. With -current as of 2023-09-03 (and right now 2023-09-11), the periodic daily runs are down to less than an hour... and this didn't happen directly after switching to 2023-09-13. First it went down to 4h, then down to 1h without any update of the OS. The only thing what I did was modifying the number of maxfiles. First to some huge amount after your commit in the sysctl affecting part. Then after noticing way more freevnodes than configured down to 500000000. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_b9e4cb8aef91aebe3093101cb46b82bc Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=833 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmUELWkACgkQEg2wmwP4 2IbAdRAAqAbVmZ7tcAbSY6BZzSbNkL+ubmkBYwD+HizeJYQCP8DO5EqHs5WgncqW CCSCkeViIU26JYB3ZPpKWwdT/b29oL8PdSeGvxhHgTLXaicRGbuZr/cSV8exaGX3 WLGeMqrzMRqihztQDEe90uL9RVgMfWkzF+sWALXxLPq7r+LQ7oM2wQr1noqrml3+ 5Oihwnw09rC0uKyaucxSfTZvvNbskCqcedxs5BVgVdkSd2PBKKO1CU1d0j8I86nU aTcMUZV7CGXmIbjfBk89iXe0Bsyl0T0cncyDrrbzappitunNO0AD4E+vP3RY3Fgp CTZ3oqjbG5rZksa17mXTxO65NB75xL/4Prmu06OAjjCGdfU9+4YB2B2E68+562yV hOWTKPtK+8yjDZC4Q2Gz4qDq8KXvBVDQvN9fo7tYFSxFlkpDTq6qAx3i6eI3qWcr O0fKC+BM43j9f1JLaLk+skYCXKiYUtmpKwayK82FQovp4uKcjuujMwiDjAyUn1Oz Yohw8wsPxHBUdUvXC8MxjfVHHD4+kBwPd/RMquQkiQRvbjKoE8ZYHvGkhnmevKeK zsAKj243OqUhX8J72XCi8HaNG0JrVdyb1o6n6dIrJ+ynbXDEQOM3acpcKtULwUDm J0XDBHSIJ4WwHnDvfZJHF8dNFFcs5+M77BATtn+dRdKjfBCCsms= =EU2g -----END PGP SIGNATURE----- --=_b9e4cb8aef91aebe3093101cb46b82bc--