From nobody Tue Apr 22 15:23:03 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZhmGM4DhGz5smfw for ; Tue, 22 Apr 2025 15:23:23 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Received: from mail.ketas.si.pri.ee (d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13e8:21e:bff:fea2:d004]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZhmGL27f9z3Mb9 for ; Tue, 22 Apr 2025 15:23:21 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ketas.si.pri.ee header.s=ketas-si-pri-ee-20240416002854-4096 header.b=NN6FqZB5; dmarc=pass (policy=reject) header.from=ketas.si.pri.ee; spf=pass (mx1.freebsd.org: domain of freebsd-current-freebsd-org111@ketas.si.pri.ee designates 2001:7d0:8437:13e8:21e:bff:fea2:d004 as permitted sender) smtp.mailfrom=freebsd-current-freebsd-org111@ketas.si.pri.ee X-Original-To: freebsd-current@freebsd.org DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ketas.si.pri.ee; s=ketas-si-pri-ee-20240416002854-4096; t=1745335385; bh=F/lCtOz3Xd6i9Gx+/77hnb6JsIaKuKY6FmZkPIBaRPs=; h=Date:From:To:Subject:In-Reply-To:References; b=NN6FqZB51SLIlei9QlOH10QXi1WwEjh9nJIdsz3kJeKBKHBpPHTgtIDmbE5xtWJwo izIfcef7XgUTv6khzRPEwddYd0X5BR6Aa4xDZ68wnMJ8dWN8/IK9Ty+ar8AA60TXCl SMVvN0B6+FRf7nQBH3HHnZJBal/l77N5ifANrDTqIzeE4GkXKK9n2ABEDHF5jnrShe gBzA4rlHE97cSFkVGS613HhXBWwsYUx8Aa02h0yIA1HlLg008qnevMvaqF/jXwEMZb 8KswXfxAZU2EVaOeVK9H55Ix1Yqem4l5C419lbv/WhnF1z2n3wWPzN9wOc4iDApQoP YPrz1pluoawSKdi7b44OouSGDvKIc7GsRB0QWHFmSAGR4rLY+HDLui4GuSohOqqm94 LyUgme8Q5o/3yhNbYLswyy/Tp4/D/2j/dcBdJft02MPg1FwsCn0XzT91/jXp7D+jZG WmsNWep8od4E/F7gXQl5P6ZEVfL8TjFpaG2i6+TNwG2vnTcOzIGPIA+kwKAVB+ckss gNOIzGX4UAum0xLP29LD4lmRtHDWsapwiW/iQ8Tsnyfj+7vZxBAW+Kwh3l1Gh7xMvp im97HvbhIS6AYZTccmEa8+1xiy7Sx7dp0KyJ55bT8QHS4vyDBNnsDHa2bojdJSDYnX phr1ReCpCzzQLJJof7JP461w= Received: from [IPv6:::1] (0114-0000-0000-0000-13c8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13c8::114]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ketas.si.pri.ee (Postfix) with ESMTPSA id C81635B10E1 for ; Tue, 22 Apr 2025 18:23:04 +0300 (EEST) Date: Tue, 22 Apr 2025 18:23:03 +0300 From: Sulev-Madis Silber To: freebsd-current@freebsd.org Subject: Re: zfs (?) issues? User-Agent: K-9 Mail for Android In-Reply-To: <1357110019.7132.1745326331870@localhost> References: <56F52DF4-2988-4F06-9F53-90D07AF5DD02@ketas.si.pri.ee> <1357110019.7132.1745326331870@localhost> Message-ID: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [2.42 / 15.00]; HFILTER_HOSTNAME_5(3.00)[d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-0.95)[-0.954]; DMARC_POLICY_ALLOW(-0.50)[ketas.si.pri.ee,reject]; NEURAL_SPAM_LONG(0.42)[0.421]; NEURAL_HAM_SHORT(-0.24)[-0.244]; R_DKIM_ALLOW(-0.20)[ketas.si.pri.ee:s=ketas-si-pri-ee-20240416002854-4096]; ONCE_RECEIVED(0.20)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:7d0:8437:1300::/56]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3249, ipnet:2001:7d0::/32, country:EE]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[ketas.si.pri.ee:+] X-Rspamd-Queue-Id: 4ZhmGL27f9z3Mb9 X-Spamd-Bar: ++ well i don't have those errors anymore so there's nothing to give i've tried to tune arc but it didn't do anything so i took those things of= f again right now i'm looking at ARC: 1487M Total, 1102M MFU, 128M MRU, 1544K Anon, 56M Header, 199M Other 942M Compressed, 18G Uncompressed, 19=2E36:1 Ratio and wonder wtf i bet there's issue somewhere and i somehow can't properly recreate it=2E = on memory pressure it does resize arc down properly so seems like i don't n= eed any limits and there's no tmpfs=2E it would be useless at that low memory sizes the problem is that i can't figure out what all those problems are, how to= recreate those conditions and how to workaround or maybe find bugs=2E also= don't have enough hw to solely test it on=2E unless i can maybe try it on = tiny 512m vm=2E and then i would need to know what to try i also don't know why those git settings help me: [core] packedGitWindowSize =3D 32m packedGitLimit =3D 128m preloadIndex =3D false [diff] renameLimit =3D 16384 how to tune it from some global place=2E and so on=2E and why it would eve= n need fiddiling so much? zfs indeed has improved a lot, previously it was = quite a hell to use i don't even know if this is related to mmap=2E even then, i don't really = get what that function even does=2E hence then "zfs (?) issue"=2E it might = even not be zfs at all there are probably multiple combined issues here i also don't really buy the idea that ton of ram would automatically fix t= his so yeah unsure what to think of this some of the issues i found that others also have=2E some of them seem new some fixes were like as if trial and errors and nobody seemed to know what= 's wrong even=2E granted, that was forum so maybe here it's better here? i mean i have used below average equipment my entire life and usual case t= o cope with this is to just give it more time=2E put more swap and just wai= t i think someone tested my git issues in 4g vm and found no issues at all? = other things seem like as i only i have them i also find kind of confusing that if this is hw, why i don't see any othe= r issues this is not the first time that i have found something confusing in fbsd t= hat later turned out to be bug and was further tested and fixed by other hence the current mailing list so maybe someone else has ideas=2E or if it= has already fix=2E and i hope there are people with much larger labs and c= ould easily tell / test things so in the end, 1) why should git on large repo cause machine to run out of memory, instea= d of just being as slow as it would need to be 2) why / what are fs operations that could cause low power machine to myst= eriously fail on zfs, when expected results would be slow fs behaviour i don't know what really happens and it's way too complex me to get all me= mory management that happens in kernel=2E i only have this wild guess that = any type of caching should happen in "leftover" ram and make things faster = if possible=2E and any fs operations that have already reported completed b= y kernel can't be suddenly found incomplete later=2E whatever that fs-relat= ed stray buildworld error was that resolved itself somehow=2E and what i ca= n recreate and i'm not expert in this so how do i even know? what's fun is how running rsync over several tb's of data doesn't seem to = cause any issues at all=2E this is still same machine, many would not recom= mend using this=2E different workload? hell knows what's all this=2E maybe later i could figure it out or actuall= y save some logs or=2E those i didn't save as i assumed it repeats itself= =2E didn't and it went off tmux window history oh well=2E yes, this is questionable report but those are "heisenbugs" as = well=2E at least some? On April 22, 2025 3:52:11 PM GMT+03:00, Ronald Klop wrote: >Hi, > >First, instead of writing "it gives vague errors", it really helps others= on this list if you can copy-paste the errors into your email=2E > >Second, as far as I can see FreeBSD 13=2E4 uses OpenZFS 2=2E1=2E14=2E Fre= eBSD 14 uses OpenZFS 2=2E2=2EX which has bugfixes and improved tuning, alth= ough I cannot claim that will fix your issues=2E >What you can try is to limit the growth of the ARC=2E > >Set "sysctl vfs=2Ezfs=2Earc_max=3D1073741824" or add this to /etc/sysctl= =2Econf to set the value at boot=2E > >This will limit the ARC to 1GB=2E I used similar settings on small machin= es without really noticing a speed difference while usability increased=2E = You can play a bit with the value=2E Maybe 512MB will be even enough for yo= ur use case=2E > >NB: sysctl vfs=2Ezfs=2Earc_max was renamed to vfs=2Ezfs=2Earc=2Emax with = arc_max as a legacy alias, but I don't know if that already happened in 13= =2E4=2E > >Another thing to check is the usage of tmpfs=2E If you don't restrict the= max size of a tmpfs filesystem it will compete for memory=2E Although this= will also show an increase in swap usage=2E > >Regards, >Ronald=2E > > >Van: Sulev-Madis Silber >Datum: maandag, 21 april 2025 03:25 >Aan: freebsd-current >Onderwerp: zfs (?) issues? >>=20 >> i have long running issue in my 13=2E4 box (amd64) >>=20 >> others don't get it at all and only suggest adding more than 4g ram >>=20 >> it manifests as some mmap or other problems i don't really get >>=20 >> basically unrestricted git consumes all the memory=2E i had to turn wat= chdog on because something a git pull on ports tree causes kernel to take 1= 00% of ram=2E it keeps killing userland off until it's just kernel running = there happily=2E it never panics and killing off userland obviously makes t= he problem disappear and nothing will do any fs operations anymore >>=20 >> dovecot without tuning or with some tuning tended to do this too >>=20 >> what is it? >>=20 >> now i noticed another issue=2E if i happen to do too many src git pulls= in a row, they never actually "pull" anything=2E and / or clean my obj tre= e out=2E i can't run buildworld anymore=2E it gives vague errors >>=20 >> if i wait a little before starting buildworld, it always works >>=20 >> what could possibly happening here? the way the buildworld fails means = there's serious issue with fs=2E and how could it be fixed with waiting? it= means that some fs operations are still going on in background >>=20 >> i have no idea what's happening here=2E zfs doesn't report any issues= =2E nor do storage=2E nothing was killed with out of memory but arc usage s= omehow increased a lot=2E and it's compression ratio went weirdly high, lik= e ~22:1 or so >>=20 >> i don't know if it's acceptable zfs behaviour if it runs low on memory = or not=2E how to test it=2E etc=2E and if this is fixed on 14, on stable, o= r on current=2E i don't have enough hw to test it on all >>=20 >> i have done other stuff on that box that might also improper for amoung= of ram i have there but then it's just slow, nothing fails like this >>=20 >> unsure how this could be fixed or tuned or something else=2E or why doe= s it behave like this=2E as opposed to usual low resource issues that just = mean you need more time >>=20 >> i mean it would be easy to add huge amounts of ram but people could als= o want to use zfs in slightly less powerful embedded systems where lack of = power is expected but weird fails maybe not >>=20 >> so is this a bug? a feature? something fixed? something that can't be f= ixed? what could be acceptable ram size? 8g? 16g? and why can't it just tun= e everything down and become slower as expected >>=20 >> i tried to look up on any openzfs related bugs but zfs is huge and i'm = not fs expert either >>=20 >> i also don't know what happens while i wait=2E it doesn't show any seri= ous io load=2E no cpu is taken=2E load is down=2E system is responsible >>=20 >> it all feels like bug still >>=20 >> i have wondered if this is second hand hw acting up but i checked and t= ested it as best as i could and why would it only bug out when i try more c= omplex things on zfs? >>=20 >> i'm curious about using zfs on super low memory systems too, because it= offers certain features=2E maybe we could fix this if whole issue is ram= =2E or if it's elsewhere, maybe that too >>=20 >> i don't know what to think of this all=2E esp the last issue=2E i'm not= really alone here with earlier issues but unsure >> =20 >>=20 >>=20 >