From nobody Tue Apr 22 17:34:35 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZhqB43Ffxz5sx3L for ; Tue, 22 Apr 2025 17:34:52 +0000 (UTC) (envelope-from tsoome@me.com) Received: from outbound.ci.icloud.com (p-east1-cluster4-host2-snip6-10.eps.apple.com [IPv6:2a01:b747:3005:204::17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZhqB23syCz3MxF for ; Tue, 22 Apr 2025 17:34:50 +0000 (UTC) (envelope-from tsoome@me.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=me.com header.s=1a1hai header.b=BPgdy37f; dmarc=pass (policy=quarantine) header.from=me.com; spf=pass (mx1.freebsd.org: domain of tsoome@me.com designates 2a01:b747:3005:204::17 as permitted sender) smtp.mailfrom=tsoome@me.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=1a1hai; bh=17J9dDsyvU9I+vuTa8f51G8LhDOPgBRyKkAXFZUs+34=; h=From:Content-Type:Mime-Version:Subject:Date:To:Message-Id:x-icloud-hme; b=BPgdy37fPBIBAEOpZjsOde2M837V1YNmjlzYuK/UWND/FmcIev/XvoTSSl51td29R ZdeGpWwsnsVx8zAAWGSIRzeNAX9JQgLxeGZkwH2yiIiDwjorHUvpT+9aT2IWxTxRfE h+IKQLHWG3AZyat5EW5GqvjejkewGZ6VQC+x66QWYM05TIDxuci2gLGjT26UVGMnbA XmaFwWcvDiL6U0DE+V9kGgZpdhT5+KP+GiHLGWWJCG4TnjnsEGMwAO0DKPOh3kXKsC Ffws6JaU2MDGIMEVhLJF4bFuzyos/4mxYzidek/egZIONf77s17naAtUZAelMntYnb AVjrwjVr+RdCw== Received: from smtpclient.apple (ci-asmtp-me-k8s.p00.prod.me.com [17.57.156.36]) by outbound.ci.icloud.com (Postfix) with ESMTPSA id B02C11804D30 for ; Tue, 22 Apr 2025 17:34:46 +0000 (UTC) From: Toomas Soome Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.500.181.1.5\)) Subject: Re: zfs (?) issues? Date: Tue, 22 Apr 2025 20:34:35 +0300 References: <56F52DF4-2988-4F06-9F53-90D07AF5DD02@ketas.si.pri.ee> <1357110019.7132.1745326331870@localhost> To: freebsd-current@freebsd.org In-Reply-To: Message-Id: <1A4907E8-315F-4A34-95F6-E6A184B9DCC8@me.com> X-Mailer: Apple Mail (2.3826.500.181.1.5) X-Proofpoint-ORIG-GUID: b4oEYedT9LQBx91bO65LWLhxQpcURqH_ X-Proofpoint-GUID: b4oEYedT9LQBx91bO65LWLhxQpcURqH_ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1095,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-04-22_08,2025-04-22_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 clxscore=1015 mlxscore=0 suspectscore=0 adultscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2504220132 X-Spamd-Result: default: False [2.97 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_SPAM_LONG(1.00)[1.000]; NEURAL_SPAM_MEDIUM(0.95)[0.954]; MV_CASE(0.50)[]; DMARC_POLICY_ALLOW(-0.50)[me.com,quarantine]; NEURAL_SPAM_SHORT(0.32)[0.318]; R_DKIM_ALLOW(-0.20)[me.com:s=1a1hai]; ONCE_RECEIVED(0.20)[]; R_SPF_ALLOW(-0.20)[+ip6:2a01:b747:3005:200::/56]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; FREEMAIL_FROM(0.00)[me.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[me.com]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; RCVD_TLS_ALL(0.00)[]; ASN(0.00)[asn:714, ipnet:2a01:b747::/32, country:US]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FREEFALL_USER(0.00)[tsoome]; DWL_DNSWL_NONE(0.00)[me.com:dkim]; RCVD_COUNT_ONE(0.00)[1]; DKIM_TRACE(0.00)[me.com:+]; FROM_HAS_DN(0.00)[] X-Rspamd-Queue-Id: 4ZhqB23syCz3MxF X-Spamd-Bar: ++ > On 22. Apr 2025, at 18:23, Sulev-Madis Silber = wrote: >=20 > well i don't have those errors anymore so there's nothing to give >=20 > i've tried to tune arc but it didn't do anything so i took those = things off again >=20 > right now i'm looking at >=20 > ARC: 1487M Total, 1102M MFU, 128M MRU, 1544K Anon, 56M Header, 199M = Other > 942M Compressed, 18G Uncompressed, 19.36:1 Ratio >=20 > and wonder wtf >=20 > i bet there's issue somewhere and i somehow can't properly recreate = it. on memory pressure it does resize arc down properly so seems like i = don't need any limits >=20 > and there's no tmpfs. it would be useless at that low memory sizes >=20 > the problem is that i can't figure out what all those problems are, = how to recreate those conditions and how to workaround or maybe find = bugs. also don't have enough hw to solely test it on. unless i can maybe = try it on tiny 512m vm. and then i would need to know what to try >=20 > i also don't know why those git settings help me: >=20 > [core] > packedGitWindowSize =3D 32m > packedGitLimit =3D 128m > preloadIndex =3D false > [diff] > renameLimit =3D 16384 >=20 > how to tune it from some global place. and so on. and why it would = even need fiddiling so much? zfs indeed has improved a lot, previously = it was quite a hell to use >=20 > i don't even know if this is related to mmap. even then, i don't = really get what that function even does. hence then "zfs (?) issue". it = might even not be zfs at all >=20 > there are probably multiple combined issues here >=20 > i also don't really buy the idea that ton of ram would automatically = fix this >=20 > so yeah unsure what to think of this >=20 > some of the issues i found that others also have. some of them seem = new >=20 > some fixes were like as if trial and errors and nobody seemed to know = what's wrong even. granted, that was forum so maybe here it's better = here? >=20 > i mean i have used below average equipment my entire life and usual = case to cope with this is to just give it more time. put more swap and = just wait >=20 > i think someone tested my git issues in 4g vm and found no issues at = all? other things seem like as i only i have them >=20 > i also find kind of confusing that if this is hw, why i don't see any = other issues >=20 > this is not the first time that i have found something confusing in = fbsd that later turned out to be bug and was further tested and fixed by = other >=20 > hence the current mailing list so maybe someone else has ideas. or if = it has already fix. and i hope there are people with much larger labs = and could easily tell / test things >=20 > so in the end, >=20 > 1) why should git on large repo cause machine to run out of memory, = instead of just being as slow as it would need to be um, because it is buggy? Or pick some other fun reason, because the this = wording does not really make much sense. >=20 > 2) why / what are fs operations that could cause low power machine to = mysteriously fail on zfs, when expected results would be slow fs = behaviour >=20 define low power? in general, the failures on system with limited = resources hint about lack of testing and bug hunting in such systems. = Over time, there have been improvements, but this is almost never ending = task. > i don't know what really happens and it's way too complex me to get = all memory management that happens in kernel. i only have this wild = guess that any type of caching should happen in "leftover" ram and make = things faster if possible. and any fs operations that have already = reported completed by kernel can't be suddenly found incomplete later. = whatever that fs-related stray buildworld error was that resolved itself = somehow. and what i can recreate >=20 default fs operations are asynchronous, if you want them to be = =E2=80=9Ccomplete=E2=80=9D, that is, data on stable storage and = consistent state for file system, you need synchronous IO. But as = always, there is price to pay. > and i'm not expert in this so how do i even know? >=20 > what's fun is how running rsync over several tb's of data doesn't seem = to cause any issues at all. this is still same machine, many would not = recommend using this. different workload? >=20 If you are comparing git with rsync, you want to make sure you have up = to date git. There are git versions with rather nasty bugs. > hell knows what's all this. maybe later i could figure it out or = actually save some logs or. those i didn't save as i assumed it repeats = itself. didn't and it went off tmux window history >=20 > oh well. yes, this is questionable report but those are "heisenbugs" = as well. at least some? >=20 Heisenbug is bug for which we do not yet know the trigger mechanism, it = does not mean they do not have such mechanism. rgds, toomas >=20 >=20 > On April 22, 2025 3:52:11 PM GMT+03:00, Ronald Klop = wrote: >> Hi, >>=20 >> First, instead of writing "it gives vague errors", it really helps = others on this list if you can copy-paste the errors into your email. >>=20 >> Second, as far as I can see FreeBSD 13.4 uses OpenZFS 2.1.14. FreeBSD = 14 uses OpenZFS 2.2.X which has bugfixes and improved tuning, although I = cannot claim that will fix your issues. >> What you can try is to limit the growth of the ARC. >>=20 >> Set "sysctl vfs.zfs.arc_max=3D1073741824" or add this to = /etc/sysctl.conf to set the value at boot. >>=20 >> This will limit the ARC to 1GB. I used similar settings on small = machines without really noticing a speed difference while usability = increased. You can play a bit with the value. Maybe 512MB will be even = enough for your use case. >>=20 >> NB: sysctl vfs.zfs.arc_max was renamed to vfs.zfs.arc.max with = arc_max as a legacy alias, but I don't know if that already happened in = 13.4. >>=20 >> Another thing to check is the usage of tmpfs. If you don't restrict = the max size of a tmpfs filesystem it will compete for memory. Although = this will also show an increase in swap usage. >>=20 >> Regards, >> Ronald. >>=20 >>=20 >> Van: Sulev-Madis Silber = >> Datum: maandag, 21 april 2025 03:25 >> Aan: freebsd-current >> Onderwerp: zfs (?) issues? >>>=20 >>> i have long running issue in my 13.4 box (amd64) >>>=20 >>> others don't get it at all and only suggest adding more than 4g ram >>>=20 >>> it manifests as some mmap or other problems i don't really get >>>=20 >>> basically unrestricted git consumes all the memory. i had to turn = watchdog on because something a git pull on ports tree causes kernel to = take 100% of ram. it keeps killing userland off until it's just kernel = running there happily. it never panics and killing off userland = obviously makes the problem disappear and nothing will do any fs = operations anymore >>>=20 >>> dovecot without tuning or with some tuning tended to do this too >>>=20 >>> what is it? >>>=20 >>> now i noticed another issue. if i happen to do too many src git = pulls in a row, they never actually "pull" anything. and / or clean my = obj tree out. i can't run buildworld anymore. it gives vague errors >>>=20 >>> if i wait a little before starting buildworld, it always works >>>=20 >>> what could possibly happening here? the way the buildworld fails = means there's serious issue with fs. and how could it be fixed with = waiting? it means that some fs operations are still going on in = background >>>=20 >>> i have no idea what's happening here. zfs doesn't report any issues. = nor do storage. nothing was killed with out of memory but arc usage = somehow increased a lot. and it's compression ratio went weirdly high, = like ~22:1 or so >>>=20 >>> i don't know if it's acceptable zfs behaviour if it runs low on = memory or not. how to test it. etc. and if this is fixed on 14, on = stable, or on current. i don't have enough hw to test it on all >>>=20 >>> i have done other stuff on that box that might also improper for = amoung of ram i have there but then it's just slow, nothing fails like = this >>>=20 >>> unsure how this could be fixed or tuned or something else. or why = does it behave like this. as opposed to usual low resource issues that = just mean you need more time >>>=20 >>> i mean it would be easy to add huge amounts of ram but people could = also want to use zfs in slightly less powerful embedded systems where = lack of power is expected but weird fails maybe not >>>=20 >>> so is this a bug? a feature? something fixed? something that can't = be fixed? what could be acceptable ram size? 8g? 16g? and why can't it = just tune everything down and become slower as expected >>>=20 >>> i tried to look up on any openzfs related bugs but zfs is huge and = i'm not fs expert either >>>=20 >>> i also don't know what happens while i wait. it doesn't show any = serious io load. no cpu is taken. load is down. system is responsible >>>=20 >>> it all feels like bug still >>>=20 >>> i have wondered if this is second hand hw acting up but i checked = and tested it as best as i could and why would it only bug out when i = try more complex things on zfs? >>>=20 >>> i'm curious about using zfs on super low memory systems too, because = it offers certain features. maybe we could fix this if whole issue is = ram. or if it's elsewhere, maybe that too >>>=20 >>> i don't know what to think of this all. esp the last issue. i'm not = really alone here with earlier issues but unsure >>>=20 >>>=20 >>>=20 >>=20 >=20