From nobody Fri Apr 29 20:41:15 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C55A01AB7CBC for ; Fri, 29 Apr 2022 20:41:19 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.nomadlogic.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Kqksp46tRz59Qc for ; Fri, 29 Apr 2022 20:41:18 +0000 (UTC) (envelope-from pete@nomadlogic.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1651264876; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=41UXKa+LLprShwGi0fw5VZCL2xjaNsEl7RaOVcg4ifY=; b=qAZ2K6B+er3cVu8rHiQ3QilbzNLcFxbaysDkHUZq9iHOgiEosW/eVTb46SQugshJDvCR31 MdX/hOw3Y/7caYTtlaSvPE1AZ2jZDP2Uj7I0n9D061q/q9yKVsEV5okMt4q+7zZQ3XqdJN fYkxp5wDOVkijtLrge/a6OLkCorQMJU= Received: from [192.168.1.160] (cpe-24-24-168-214.socal.res.rr.com [24.24.168.214]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id c26f386e (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 29 Apr 2022 20:41:15 +0000 (UTC) Message-ID: <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> Date: Fri, 29 Apr 2022 13:41:15 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Content-Language: en-US To: Mark Millard Cc: freebsd-current References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> From: Pete Wright In-Reply-To: <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4Kqksp46tRz59Qc X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=qAZ2K6B+; dmarc=pass (policy=quarantine) header.from=nomadlogic.org; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org X-Spamd-Result: default: False [-1.30 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(0.70)[0.699]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; MLMMJ_DEST(0.00)[freebsd-current]; FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 4/29/22 11:38, Mark Millard wrote: > On 2022-Apr-29, at 11:08, Pete Wright wrote: > >> On 4/23/22 19:20, Pete Wright wrote: >>>> The developers handbook has a section debugging deadlocks that he >>>> referenced in a response to another report (on freebsd-hackers). >>>> >>>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks >>> d'oh - thanks for the correction! >>> >>> -pete >>> >>> >> hello, i just wanted to provide an update on this issue. so the good news is that by removing the file backed swap the deadlocks have indeed gone away! thanks for sorting me out on that front Mark! > Glad it helped. d'oh - went out for lunch and workstation locked up.  i *knew* i shouldn't have said anything lol. >> i still am seeing a memory leak with either firefox or chrome (maybe both where they create a voltron of memory leaks?). this morning firefox and chrome had been killed when i first logged in. fortunately the system has remained responsive for several hours which was not the case previously. >> >> when looking at my metrics i see vm.domain.0.stats.inactive take a nose dive from around 9GB to 0 over the course of 1min. the timing seems to align with around the time when firefox crashed, and is proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before the apps crashed. after the binaries were killed memory metrics seem to have recovered (laundry size grew, and inactive size grew by several gigs for example). > Since the form of kill here is tied to sustained low free memory > ("failed to reclaim memory"), you might want to report the > vm.domain.0.stats.free_count figures from various time frames as > well: > > vm.domain.0.stats.free_count: Free pages > > (It seems you are converting pages to byte counts in your report, > the units I'm not really worried about so long as they are > obvious.) > > There are also figures possibly tied to the handling of the kill > activity but some being more like thresholds than usage figures, > such as: > > vm.domain.0.stats.free_severe: Severe free pages > vm.domain.0.stats.free_min: Minimum free pages > vm.domain.0.stats.free_reserved: Reserved free pages > vm.domain.0.stats.free_target: Target free pages > vm.domain.0.stats.inactive_target: Target inactive pages ok thanks Mark, based on this input and the fact i did manage to lock up my system, i'm going to get some metrics up on my website and share them publicly when i have time.  i'll definitely take you input into account when sharing this info. > > Also, what value were you using for: > > vm.pageout_oom_seq $ sysctl vm.pageout_oom_seq vm.pageout_oom_seq: 120 $ cheers, -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA