From nobody Wed Apr 23 02:04:28 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zj2VH3H8Wz5tXm3 for ; Wed, 23 Apr 2025 02:04:39 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Received: from mail.ketas.si.pri.ee (d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13e8:21e:bff:fea2:d004]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zj2VG1HQ3z47lW for ; Wed, 23 Apr 2025 02:04:37 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ketas.si.pri.ee header.s=ketas-si-pri-ee-20240416002854-4096 header.b=NHxNZH6v; dmarc=pass (policy=reject) header.from=ketas.si.pri.ee; spf=pass (mx1.freebsd.org: domain of freebsd-current-freebsd-org111@ketas.si.pri.ee designates 2001:7d0:8437:13e8:21e:bff:fea2:d004 as permitted sender) smtp.mailfrom=freebsd-current-freebsd-org111@ketas.si.pri.ee X-Original-To: freebsd-current@freebsd.org DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ketas.si.pri.ee; s=ketas-si-pri-ee-20240416002854-4096; t=1745373869; bh=i4X4VxNy4qa7sKc6eJRNsuYl7LZN6UmRNkAwRFfNvsg=; h=Date:From:To:Subject:In-Reply-To:References; b=NHxNZH6vaWEiWkkryRytuXYcOmn5s1DUwCLM1+gkSr4JtvdAaFlIUlpqgBRt/ikTY 4B4MV9YZ7BsLXxsJS4XTKtpZ419QQ4v1G6EXTqL53tG62Gf5Wg7wmnIGBhXHgU6Fo5 3FJ3i2WkQbs6D9RvEIc8PgFVwXykW2jMBtZZFUP0AIGFYQ0H6otKWkGg0F50vBnUFw 5uvz8AuCoKORHNb9k+/bdV55AMrZYkdZRA1v1C5uccgZ6fnXLsL6iB9De94mAzfxsr X/nTk8tLDxHl1P/QZQG1sgFgVXjcwViBlBb35lxr1OuR85k9kBGVSe98yuko5i0srg qdidt4ME5ZIDkQGG33uyzeQOM1ySHnSP7cduodJhHOHez/6Uq+F8LDPE7Mh2NavrnQ HC6Is7E5cgFqYdPNT5o33miYQPRFhl99Q4cEodF2J64/ZgUNF+OmhinF2IBLYTcddC jez5MMt2QbfiALMGGWXjq2J93sVQTscPp7RWgA417gniJbb/it57gdn+OqdKqcIxAX kGW9dSh2qbni1sg89spvGSEAii4gOhnMZBYf3TMtWiA6A/tZsZD5lIcuX6T/v2YBCl Wrg96BlVtuYJUT2WUbqFrlpOXD3qk91lYTaIVDM75TocpJQpidyWNxn5U9She7dGA0 ZO1b4fwK/5r7u4yO+CSft+IM= Received: from [IPv6:::1] (0114-0000-0000-0000-13c8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13c8::114]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ketas.si.pri.ee (Postfix) with ESMTPSA id D79F95B3436 for ; Wed, 23 Apr 2025 05:04:29 +0300 (EEST) Date: Wed, 23 Apr 2025 05:04:28 +0300 From: Sulev-Madis Silber To: freebsd-current@freebsd.org Subject: Re: zfs (?) issues? User-Agent: K-9 Mail for Android In-Reply-To: References: <56F52DF4-2988-4F06-9F53-90D07AF5DD02@ketas.si.pri.ee> <1357110019.7132.1745326331870@localhost> Message-ID: <2E9A6E62-85CA-4C34-A22E-15A8EACB83EC@ketas.si.pri.ee> List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [1.97 / 15.00]; HFILTER_HOSTNAME_5(3.00)[d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_SHORT(-0.99)[-0.988]; DMARC_POLICY_ALLOW(-0.50)[ketas.si.pri.ee,reject]; NEURAL_HAM_LONG(-0.24)[-0.238]; ONCE_RECEIVED(0.20)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:7d0:8437:1300::/56]; R_DKIM_ALLOW(-0.20)[ketas.si.pri.ee:s=ketas-si-pri-ee-20240416002854-4096]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_MEDIUM(0.00)[0.001]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:3249, ipnet:2001:7d0::/32, country:EE]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[ketas.si.pri.ee:+] X-Rspamd-Queue-Id: 4Zj2VG1HQ3z47lW X-Spamd-Bar: + yes, 2 * 8g partitions on separate disks, so i have 16g swap but issues i see aren't "usual" memory problems=2E even building rust work= s, just takes 10g swap and entire day=2E nothing will fail=2E nothing will = lag but what i see are kernel taking too much=2E as far as i understand=2E i c= an't figure out when and how it all seems to originate from git=2E but what could it possibly do that m= akes it a perfect stress test tool? number of files in single directory? fi= le size? count? type of access method and speed of those calls? order? basically don't touch git, esp=2E large repos but then dovecot, also running there, also nearly caused similar issues=2E= all i know is a maildirs are also ton of small files i don't know what's happening and should it or not during earlier tests with git on ports tree which had ton of changes, i ob= served that arc was quite low but wired went super high super fast=2E nothi= ng was swapped out iirc that was unswappable kernel memory of a kind funnily=2E git without limits took machine down but git with limits had no= real speed reduction while not taking it down i tried to even figure out where kernel memory might be going but couldn't= figure it out either=2E it seemed to like have normal small usage for each= part, including zfs i wonder what's the best way to look into that? and is this expected or not=2E i don't know! the question of swap would be good if i would run out of memory in userl= and or so unsure what runs out=2E maybe it's even fine=2E but i refuse to believe it= 's expected? unsure if it could even be fixed without zfs performance being also affect= ed=2E there are also massive servers and people would be pissed if puny lit= tle box issues affect them=2E but why it can't be dynamic or so? i recall zfs being worse before=2E first it required ton of ram=2E like 4g= min=2E 10y ago and so=2E then this was somehow reduced=2E then r/w speed w= as low, like 40mb/s=2E then it was fixed too=2E what's this now? how to like even figure out what kind of memory is exhausted? and why woul= d i need to tune any of this=2E as system would know what's installed ram s= ize i kind of have troubles imagining a system where whole ram goes to kernel= =2E maybe there is i tried to look what else besides arc could be limited but couldn't find a= ny=2E don't even know what happens all i know this isn't usual case of ram runs low, everything grinds to hal= t until things swap out and eventually get killed i even tested that=2E if i specifically try to allocate ton of memory from= userland, arc reduces properly, wired goes down, etc, eventually something= gets killed off, usually the offending process but this is something else i don't fully get=2E as i use zfs, i blame it= =2E maybe i should not On April 22, 2025 6:49:43 PM GMT+03:00, Rick Macklem wrote: >I wouldn't normally top post, but all I have is a generic question=2E > >Do you have a swap partition setup? >(I'd use something like 6-8Gbytes for a 4Gbyte system=2E) > >rick > >On Tue, Apr 22, 2025 at 8:23=E2=80=AFAM Sulev-Madis Silber > wrote: >> >> well i don't have those errors anymore so there's nothing to give >> >> i've tried to tune arc but it didn't do anything so i took those things= off again >> >> right now i'm looking at >> >> ARC: 1487M Total, 1102M MFU, 128M MRU, 1544K Anon, 56M Header, 199M Oth= er >> 942M Compressed, 18G Uncompressed, 19=2E36:1 Ratio >> >> and wonder wtf >> >> i bet there's issue somewhere and i somehow can't properly recreate it= =2E on memory pressure it does resize arc down properly so seems like i don= 't need any limits >> >> and there's no tmpfs=2E it would be useless at that low memory sizes >> >> the problem is that i can't figure out what all those problems are, how= to recreate those conditions and how to workaround or maybe find bugs=2E a= lso don't have enough hw to solely test it on=2E unless i can maybe try it = on tiny 512m vm=2E and then i would need to know what to try >> >> i also don't know why those git settings help me: >> >> [core] >> packedGitWindowSize =3D 32m >> packedGitLimit =3D 128m >> preloadIndex =3D false >> [diff] >> renameLimit =3D 16384 >> >> how to tune it from some global place=2E and so on=2E and why it would = even need fiddiling so much? zfs indeed has improved a lot, previously it w= as quite a hell to use >> >> i don't even know if this is related to mmap=2E even then, i don't real= ly get what that function even does=2E hence then "zfs (?) issue"=2E it mig= ht even not be zfs at all >> >> there are probably multiple combined issues here >> >> i also don't really buy the idea that ton of ram would automatically fi= x this >> >> so yeah unsure what to think of this >> >> some of the issues i found that others also have=2E some of them seem n= ew >> >> some fixes were like as if trial and errors and nobody seemed to know w= hat's wrong even=2E granted, that was forum so maybe here it's better here? >> >> i mean i have used below average equipment my entire life and usual cas= e to cope with this is to just give it more time=2E put more swap and just = wait >> >> i think someone tested my git issues in 4g vm and found no issues at al= l? other things seem like as i only i have them >> >> i also find kind of confusing that if this is hw, why i don't see any o= ther issues >> >> this is not the first time that i have found something confusing in fbs= d that later turned out to be bug and was further tested and fixed by other >> >> hence the current mailing list so maybe someone else has ideas=2E or if= it has already fix=2E and i hope there are people with much larger labs an= d could easily tell / test things >> >> so in the end, >> >> 1) why should git on large repo cause machine to run out of memory, ins= tead of just being as slow as it would need to be >> >> 2) why / what are fs operations that could cause low power machine to m= ysteriously fail on zfs, when expected results would be slow fs behaviour >> >> i don't know what really happens and it's way too complex me to get all= memory management that happens in kernel=2E i only have this wild guess th= at any type of caching should happen in "leftover" ram and make things fast= er if possible=2E and any fs operations that have already reported complete= d by kernel can't be suddenly found incomplete later=2E whatever that fs-re= lated stray buildworld error was that resolved itself somehow=2E and what i= can recreate >> >> and i'm not expert in this so how do i even know? >> >> what's fun is how running rsync over several tb's of data doesn't seem = to cause any issues at all=2E this is still same machine, many would not re= commend using this=2E different workload? >> >> hell knows what's all this=2E maybe later i could figure it out or actu= ally save some logs or=2E those i didn't save as i assumed it repeats itsel= f=2E didn't and it went off tmux window history >> >> oh well=2E yes, this is questionable report but those are "heisenbugs" = as well=2E at least some? >> >> >> >> On April 22, 2025 3:52:11 PM GMT+03:00, Ronald Klop wrote: >> >Hi, >> > >> >First, instead of writing "it gives vague errors", it really helps oth= ers on this list if you can copy-paste the errors into your email=2E >> > >> >Second, as far as I can see FreeBSD 13=2E4 uses OpenZFS 2=2E1=2E14=2E = FreeBSD 14 uses OpenZFS 2=2E2=2EX which has bugfixes and improved tuning, a= lthough I cannot claim that will fix your issues=2E >> >What you can try is to limit the growth of the ARC=2E >> > >> >Set "sysctl vfs=2Ezfs=2Earc_max=3D1073741824" or add this to /etc/sysc= tl=2Econf to set the value at boot=2E >> > >> >This will limit the ARC to 1GB=2E I used similar settings on small mac= hines without really noticing a speed difference while usability increased= =2E You can play a bit with the value=2E Maybe 512MB will be even enough fo= r your use case=2E >> > >> >NB: sysctl vfs=2Ezfs=2Earc_max was renamed to vfs=2Ezfs=2Earc=2Emax wi= th arc_max as a legacy alias, but I don't know if that already happened in = 13=2E4=2E >> > >> >Another thing to check is the usage of tmpfs=2E If you don't restrict = the max size of a tmpfs filesystem it will compete for memory=2E Although t= his will also show an increase in swap usage=2E >> > >> >Regards, >> >Ronald=2E >> > >> > >> >Van: Sulev-Madis Silber >> >Datum: maandag, 21 april 2025 03:25 >> >Aan: freebsd-current >> >Onderwerp: zfs (?) issues? >> >> >> >> i have long running issue in my 13=2E4 box (amd64) >> >> >> >> others don't get it at all and only suggest adding more than 4g ram >> >> >> >> it manifests as some mmap or other problems i don't really get >> >> >> >> basically unrestricted git consumes all the memory=2E i had to turn = watchdog on because something a git pull on ports tree causes kernel to tak= e 100% of ram=2E it keeps killing userland off until it's just kernel runni= ng there happily=2E it never panics and killing off userland obviously make= s the problem disappear and nothing will do any fs operations anymore >> >> >> >> dovecot without tuning or with some tuning tended to do this too >> >> >> >> what is it? >> >> >> >> now i noticed another issue=2E if i happen to do too many src git pu= lls in a row, they never actually "pull" anything=2E and / or clean my obj = tree out=2E i can't run buildworld anymore=2E it gives vague errors >> >> >> >> if i wait a little before starting buildworld, it always works >> >> >> >> what could possibly happening here? the way the buildworld fails mea= ns there's serious issue with fs=2E and how could it be fixed with waiting?= it means that some fs operations are still going on in background >> >> >> >> i have no idea what's happening here=2E zfs doesn't report any issue= s=2E nor do storage=2E nothing was killed with out of memory but arc usage = somehow increased a lot=2E and it's compression ratio went weirdly high, li= ke ~22:1 or so >> >> >> >> i don't know if it's acceptable zfs behaviour if it runs low on memo= ry or not=2E how to test it=2E etc=2E and if this is fixed on 14, on stable= , or on current=2E i don't have enough hw to test it on all >> >> >> >> i have done other stuff on that box that might also improper for amo= ung of ram i have there but then it's just slow, nothing fails like this >> >> >> >> unsure how this could be fixed or tuned or something else=2E or why = does it behave like this=2E as opposed to usual low resource issues that ju= st mean you need more time >> >> >> >> i mean it would be easy to add huge amounts of ram but people could = also want to use zfs in slightly less powerful embedded systems where lack = of power is expected but weird fails maybe not >> >> >> >> so is this a bug? a feature? something fixed? something that can't b= e fixed? what could be acceptable ram size? 8g? 16g? and why can't it just = tune everything down and become slower as expected >> >> >> >> i tried to look up on any openzfs related bugs but zfs is huge and i= 'm not fs expert either >> >> >> >> i also don't know what happens while i wait=2E it doesn't show any s= erious io load=2E no cpu is taken=2E load is down=2E system is responsible >> >> >> >> it all feels like bug still >> >> >> >> i have wondered if this is second hand hw acting up but i checked an= d tested it as best as i could and why would it only bug out when i try mor= e complex things on zfs? >> >> >> >> i'm curious about using zfs on super low memory systems too, because= it offers certain features=2E maybe we could fix this if whole issue is ra= m=2E or if it's elsewhere, maybe that too >> >> >> >> i don't know what to think of this all=2E esp the last issue=2E i'm = not really alone here with earlier issues but unsure >> >> >> >> >> >> >> > >> >