From nobody Fri Apr 22 23:42:42 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 310FA1A238B1 for ; Fri, 22 Apr 2022 23:42:46 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.nomadlogic.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KlWDN6MK5z3Fwj for ; Fri, 22 Apr 2022 23:42:44 +0000 (UTC) (envelope-from pete@nomadlogic.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1650670963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PePm2wK6y++KI/S0usQ5PsGlLCzWFyqifGFuffI8qaM=; b=ORmrIJVj4kywk+0726j8d+aPn9GSDa4w6c3OwSsOxNcAiphtiyQmnoeSPxOJ0dxNM5nzp0 JTxGduSLb5hqQdapNZXjWUrd8UoV6ewSENK0sSqeD++F3oXoYw8omn8bEV+f2DrjdKIicn o7LIp8TogwhWArc64ta6kkpOxeCv3Zc= Received: from [192.168.1.160] (cpe-24-24-168-214.socal.res.rr.com [24.24.168.214]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id b82d73ba (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 22 Apr 2022 23:42:42 +0000 (UTC) Message-ID: Date: Fri, 22 Apr 2022 16:42:42 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Content-Language: en-US To: Mark Millard , freebsd-current References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> From: Pete Wright In-Reply-To: <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4KlWDN6MK5z3Fwj X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=ORmrIJVj; dmarc=pass (policy=quarantine) header.from=nomadlogic.org; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org X-Spamd-Result: default: False [-1.30 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(0.70)[0.696]; NEURAL_HAM_MEDIUM(-1.00)[-0.996]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[nomadlogic.org:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; MLMMJ_DEST(0.00)[freebsd-current]; FREEMAIL_TO(0.00)[yahoo.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 4/21/22 21:18, Mark Millard wrote: > > Messages in the console out would be appropriate > to report. Messages might also be available via > the following at appropriate times: that is what is frustrating.  i will get notification that the processes are killed: Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory the system in this case had killed both firefox and chrome while i was afk.  i logged back in and started them up to do more more, then the next logline is from this morning when i had to force power off/on the system as they keyboard and network were both unresponsive: Apr 22 09:58:20 topanga syslogd: kernel boot file is /boot/kernel/kernel > Do you have any swap partitions set up and in use? The > details could be relevant. Do you have swap set up > some other way than via swap partition use? No swap? yes i have a 2GB of swap that resides on a nvme device. > ZFS (so with ARC)? UFS? Both? i am using ZFS and am setting my vfs.zfs.arc.max to 10G.  i have also experienced this crash with that set to the default unlimited value as well. > > The first block of lines from a top display could be > relevant, particularly when it is clearly progressing > towards having the problem. (After the problem is too > late.) (I just picked top as a way to get a bunch of > the information all together automatically.) since the initial OOM events happen when i am AFK it is difficult to get relevant stats out of top. this is why i've started collecting more detailed metrics in prometheus.  my hope is i'll be able to do a better job observing how my system is behaving over time, in the run up to the OOM event as well as right before and after.  there are heaps of metrics collected though so hoping someone can point me in the right direction :) -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA