From nobody Sat Apr 26 12:01:01 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zl7bG4DlNz5v6BS for ; Sat, 26 Apr 2025 12:01:14 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Received: from mail.ketas.si.pri.ee (d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13e8:21e:bff:fea2:d004]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zl7bF2Pyqz3NN5 for ; Sat, 26 Apr 2025 12:01:13 +0000 (UTC) (envelope-from freebsd-current-freebsd-org111@ketas.si.pri.ee) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ketas.si.pri.ee header.s=ketas-si-pri-ee-20240416002854-4096 header.b=iHSQXaoT; dmarc=pass (policy=reject) header.from=ketas.si.pri.ee; spf=pass (mx1.freebsd.org: domain of freebsd-current-freebsd-org111@ketas.si.pri.ee designates 2001:7d0:8437:13e8:21e:bff:fea2:d004 as permitted sender) smtp.mailfrom=freebsd-current-freebsd-org111@ketas.si.pri.ee X-Original-To: freebsd-current@freebsd.org DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ketas.si.pri.ee; s=ketas-si-pri-ee-20240416002854-4096; t=1745668863; bh=nUwRAPwJ7wSCZqIVDZJi5YUJG16JdH4aIHc7uho9HVo=; h=Date:From:To:Subject:In-Reply-To:References; b=iHSQXaoTF8773noRXypNf6mNcVD2eh5pEkhasRxZOnfr7QR+CzAmdiVutp1Gsk//+ 8tCzljt03zRLMOTabNVbdA4FPBxRtWKFRPf+jh9cfzMsGZY1RgDIqAeeVzuyggsIG7 XoJZyr5G/sjM2XrEq3lpZaYFibrJqhGjL/aD0RSMfm2YtM4USf9yURD3Ijztl9joD6 bBAxorNY8bhkkaW1Pl1Zz4W8ncYfS3ie8wYHtiGf2JSdYJlJSl37p56NJd4MnqFPCK hgk3fccADhvYwR33XpTZb8OUN66Gfyuwt1Fnc2nopYFWiHuQO2v2aQCv3zdDYdin5g XUJUBEVKfyn9WWIUh3uR+FsaUF/PPCBoAdFQzrvdrevZ1Snym3Co7ZZD/6Sz9ebbRu muKM1frYrztNH/UOmxeelxXsc2Q00suggJhx13MLL78F+LEAOXLvvriQLqSEkrxkvp fsK3MBH0A2xzwWbfQzDDQBpeh5N64ccyyLGAH2BgYIF//ThoiW9CqayBpYxdRn50w5 J3q9RTFJnEqmPVQEWjI2IFMW4kNDWDNAI23v3nailiE4IVEt5fezHWneb1wRVx5QmV r7GNyaIv9cyXyLZl6X6650IQhNQD90C1Es6hwfb0yNH0Wby8F8Uti4biD+FjQw5YFX jcx78kJnxxFwQhEqR0xg9+JQ= Received: from [IPv6:::1] (0114-0000-0000-0000-13c8-8437-07d0-2001.dyn.estpak.ee [IPv6:2001:7d0:8437:13c8::114]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ketas.si.pri.ee (Postfix) with ESMTPSA id E67325A4A2F for ; Sat, 26 Apr 2025 15:01:02 +0300 (EEST) Date: Sat, 26 Apr 2025 15:01:01 +0300 From: Sulev-Madis Silber To: freebsd-current@freebsd.org Subject: Re: zfs (?) issues? User-Agent: K-9 Mail for Android In-Reply-To: References: <56F52DF4-2988-4F06-9F53-90D07AF5DD02@ketas.si.pri.ee> Message-ID: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [0.75 / 15.00]; HFILTER_HOSTNAME_5(3.00)[d004-fea2-0bff-021e-13e8-8437-07d0-2001.dyn.estpak.ee]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[ketas.si.pri.ee,reject]; NEURAL_HAM_MEDIUM(-0.45)[-0.452]; R_DKIM_ALLOW(-0.20)[ketas.si.pri.ee:s=ketas-si-pri-ee-20240416002854-4096]; ONCE_RECEIVED(0.20)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:7d0:8437:1300::/56]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:3249, ipnet:2001:7d0::/32, country:EE]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[ketas.si.pri.ee:+] X-Rspamd-Queue-Id: 4Zl7bF2Pyqz3NN5 X-Spamd-Bar: / On April 23, 2025 6:40:44 PM GMT+03:00, void wrote: >On Mon, Apr 21, 2025 at 04:25:16AM +0300, Sulev-Madis Silber wrote: >> i have long running issue in my 13=2E4 box (amd64) >>=20 >> others don't get it at all and only suggest adding more than 4g ram >>=20 >> it manifests as some mmap or other problems i don't really get >>=20 >> basically unrestricted git consumes all the memory=2E=20 > >I see symptoms like this on very slow media=2E That media might >be inherently slow (like microSD) or it might mean the hd/ssd is worn out= [1]=2E Programs like git and subversion read and write lots of small files= , and the os/filesystem might not be able to write >to slow media as fast as git would like=2E=20 >[1] observed failure mode of some hardware, where writes just get slo= wer and slower=2E > >[2] the workaround where the machine *has* to use micro sd, in my > example, to update ports, was to download latest ports=2Etar=2Exz and > unzip it, rather than use git=2E > >[3] test hd performance with fio (/usr/ports/benchmarks/fio) that might be it!? there is hdd on machine that was tested but now never r= eally likes to complete the long smart tests, and short take ages=2E there = are no "usual" disk errors, tho=2E that hdd is part of 2 disk mirror that t= he git runs on but there could be fix for this that never affects people=2E i don't know = how internals really are but slow io could fill the buffers up=2E those can= be checked and fs could be limited=2E eg, simply not telling that write wa= s ok yet=2E that would make things slower if queue is full, so git would wa= it=2E i bet that there are checks for it, maybe they just don't work well? = it can't be just blindly taking writes hoping they could be committed up to= storage in some future time or i could be wrong and it's some other issue i'm wondering why noone else spots it much, tho? because io could be slow = due media being abnormally slow by design=2E or it could be failing=2E but = it could also that influx is just past what storage can do=2E and this coul= d happen in fast machines too=2E or it could be happening due accident or e= ven attack=2E if i get this correctly=2E oom protect won't save any userlan= d process here either? so it truly was all about kernel wanting to allocate= all of the ram=2E which it did=2E i didn't see single userland process run= ning iirc but i couldn't check either=2E kernel itself kept running perfect= ly fine after that=2E fix of that particular failure is to enable watchdog = of course=2E i think i've seen it on another machine as well but never real= ized=2E or maybe it was hw there and kernel was also frozen=2E when i turne= d to check, i found caps bulging if all up is correct, it could be easily tested, with gnop maybe=2E i don'= t see speed constriction option but i see delays=2E maybe even i can test i= t now, as it doesn't need huge ram just to prove the point that it fills up= completely and this is not fixed on current either? and fix is in zfs? and ufs, as te= sted by others, would not be affected=2E=2E=2E why? i know zfd does cow but= =2E anyway, i can't figure it out=2E that's why i don't dev fs'es=2E maybe = tell kirk even? : p what's funny is how kernel knew to stop there? was it just because it fina= lly was reaching actual ram limits=2E or just because writes stopped due ki= lled git=2E i'm not sure what kernel memory full could mean=2E panic always= ? since you can't kill things in kernel=2E or it could give errors or delay= ? like completion of syscalls is delayed? i'm unsure about all this=2E i'll= leave it for others=2E but it's often told that, like, full memory in kern= el leads to panics=2E couldn't it just error out? tl;dr - suspected issue of zfs on slow device filling up *entire* ram with= write buffers, leaving userland killed and system in unusable state