From nobody Mon Oct 04 04:07:22 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CE93017D94AA for ; Mon, 4 Oct 2021 04:07:40 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HN6cr013Wz3JXp; Mon, 4 Oct 2021 04:07:40 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f54.google.com with SMTP id l16-20020a9d6a90000000b0053b71f7dc83so19925166otq.7; Sun, 03 Oct 2021 21:07:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Cv1tT1fdg3ADFbOA5GgZvbDo45pVUnK2svjJh2g29OQ=; b=BX/mK1Vk8Tg5YJw6ZFwrxwM4/QgMjBe6Xe/utOEtz6U3lG8U+ERrwArKF8NZG+XrHS lQrDEfcjWBR8mal9i0s2h+Ou7tQ/otvSN93fTzZ7rlzQ3RhPrpRtNmTBVNHr3899uM4K HYQR52wivrjAjRuccXct03oHBXbORW3GUAmWfnZFeGdfJE3yGvgd/KoGXJ8aXmXtvJoQ 2YEfZr5na3QtHRy1UbxC/AxvS5fsqT58r3pIkoROpF8Fe1PimRp24OQ3G+2a/jHw08qu Oc2JYI8vNQbdBugSFxjSkeVV0sc3hX/b+DdVgEVq4hjPr59erqSCcMHKQ3jNNuKcLxiQ Cu0w== X-Gm-Message-State: AOAM532GRbVNu7tftfxNI96ELBVkqkEvBktS8WQx40nH8mZZF0Uaujov yhWQXb180f61HMuMxJFfuwZCwO2x6QhhQ3Vp8kO5nA2Fg0M= X-Google-Smtp-Source: ABdhPJw1bOHZemQrT4rtvitQB0GoRAlItpIlXCxwsq3FhixvLsuLFZUXYmjqWIZYZW9DyAGHypgInSEnQNk8KjCzH4g= X-Received: by 2002:a05:6830:2783:: with SMTP id x3mr7810616otu.371.1633320453470; Sun, 03 Oct 2021 21:07:33 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <6eecc842ba7a37af6b2ffe146dfd91da@mikej.com> <1684681.MCyL5Ev91y@ralph.baldwin.cx> <54018b1b2feaab3b05d7ed406eb8273c@mikej.com> In-Reply-To: From: Alan Somers Date: Sun, 3 Oct 2021 22:07:22 -0600 Message-ID: Subject: Re: witness_lock_list_get: witness exhausted To: Mateusz Guzik Cc: Michael Jung , John Baldwin , FreeBSD Current , owner-freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HN6cr013Wz3JXp X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.210.54 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-2.12 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; RCVD_TLS_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; RWL_MAILSPIKE_GOOD(0.00)[209.85.210.54:from]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_SHORT(-0.12)[-0.122]; RCVD_IN_DNSWL_NONE(0.00)[209.85.210.54:from]; FREEMAIL_TO(0.00)[gmail.com]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: N On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: > > On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: > > > On 2018-01-08 13:39, John Baldwin wrote: > > > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: > >> > >>> Hi! > >>> > >>> I've recently up'd my processor count on our poudriere box and have > >>> started noticing the error > >>> "witness_lock_list_get: witness exhausted" on the console. The kernel > >>> *DOES NOT* crash but I > >>> thought the report may be useful to someone. > >>> > >>> $ uname -a > >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov > >>> 19 18:41:20 EST 2017 > >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > >>> > >>> The machine is pretty busy running four poudriere build instances. > >>> > >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 > >>> > >>> up 6+07:32:59 14:44:03 > >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock > >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle > >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free > >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other > >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio > >>> > >>> Let me know what additional information I might supply. > >>> > >> > >> This just means that WITNESS stopped working because it ran out of > >> pre-allocated objects. In particular the objects used to track how > >> many locks are held by how many threads: > >> > >> /* > >> * XXX: This is somewhat bogus, as we assume here that at most 2048 > >> threads > >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should > >> * probably be safe for the most part, but it's still a SWAG. > >> */ > >> #define LOCK_NCHILDREN 5 > >> #define LOCK_CHILDCOUNT 2048 > >> > >> Probably the '2048' (max number of concurrent threads) needs to scale with > >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. > >> > > > > > > Thank you for you explanation. We are expanding our ESXi cluster and even > > though with standard edition I can only assign 64 vCPU's to a guest and as > > much > > RAM as I want, I do like to help with edge cases if I can make them occur > > pushing > > boundaries as I can towards additianional improvements in FreeBSD. > > > > Can you apply this and re-run the test? > > https://people.freebsd.org/~mjg/witness.diff > > It bumps the counters to be "high enough" but also starts tracking usage. > If you get > the message again, bump the values even higher. > > Once you get a complete poudriere run which did not result in the problem, > do: > $ sysctl debug.witness.list_used debug.witness.list_max_used > > to dump the actual usage. This is a nice little patch. Can we commit to head? Even better would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, here's what I get shortly after boot: debug.witness.list_max_used: 8432 debug.witness.list_used: 8420 -Alan