From nobody Mon Oct 04 11:27:51 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A5FC617AC233 for ; Mon, 4 Oct 2021 11:27:53 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ot1-x32d.google.com (mail-ot1-x32d.google.com [IPv6:2607:f8b0:4864:20::32d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HNJNn3TKgz4bth; Mon, 4 Oct 2021 11:27:53 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-ot1-x32d.google.com with SMTP id o59-20020a9d2241000000b0054745f28c69so21005198ota.13; Mon, 04 Oct 2021 04:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=7vJz9ZLoS9Xl9ezAD+E97iNT3DCT9/XveTL0W9nXcw0=; b=fHtgBf6RHi7B42FBzMG+pX4fjxF5AhyKLQtgfzvWjzsf0LSKfMly9LpMF8nDnnKjtl KtaP7jFoA199nMurelAHb9v478sHGYX79CTwzIEbRju/iuOX/c1aic0LhnhMlzUbaRIV x6U+yRRG9pF+9VUg4AzzquQqj8rCOeTVap/68MaGgpcgpTxmOma21BhDR/HZ+z9CH85i FytjzotusNCPv5ZiXpumxaZjN6A5CWYO5QUWZZLcGrQwx0WVTyD0HdkS7L9Laf5EgiRM iYrFs54BJXKVvR1cFnA1YupVj7MhxhKXBepKJUk3WO6CKRszrZ9On/mKXdwvniYFaovn aI3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7vJz9ZLoS9Xl9ezAD+E97iNT3DCT9/XveTL0W9nXcw0=; b=L9anoP+o0vtGEfAyy2GrJEKoxAYPQGVGG5HEz7us0x7u6MlGUQqAgky5o1OobckdZV aEc6LLTNUKoDFdcqy3P9NChpV4uZv6t0de/lZw9fjvYIiuVXDitq4C7GpQmyuZFdKu+W aJ1Td9obEa1kNnDuinK3Cx/yJ46ceUeoI/gDEG/1VwUSElqljryECSDGgtx+8FD4UETT HN4yOj/krxhD8pGQ2lMNOd+5Q5Sh3rKnLKMbwQo3bUc8Znt+ZrWaVlkZfOCfCg+MIggl PaQHQkjOCS9EfUc/Un9yYSLdd3nABW9TC1eXNlrIbqIe3Jfw0zKcY3FF94OqdZUQ4TKb azYg== X-Gm-Message-State: AOAM531R1IMC4Nbee6LzK8I9c0YIoyCSgaCf3HReZWqpYaCP1LQMg+fs nMfwAtE4DnG0hH0+8k5Zq+qpcskyJXOPRBB2oG76wuiN X-Google-Smtp-Source: ABdhPJzzsry/lqfnGKfy9u1hqncqT3cy6pRaPg7roeDAn2TjTRRRUaLrwqEXpevYYUAJLddSoMAHvMgnqLJOVlv2dDU= X-Received: by 2002:a9d:192c:: with SMTP id j44mr8855610ota.302.1633346872296; Mon, 04 Oct 2021 04:27:52 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:126d:0:b0:3b4:5824:6a18 with HTTP; Mon, 4 Oct 2021 04:27:51 -0700 (PDT) In-Reply-To: References: <6eecc842ba7a37af6b2ffe146dfd91da@mikej.com> <1684681.MCyL5Ev91y@ralph.baldwin.cx> <54018b1b2feaab3b05d7ed406eb8273c@mikej.com> From: Mateusz Guzik Date: Mon, 4 Oct 2021 13:27:51 +0200 Message-ID: Subject: Re: witness_lock_list_get: witness exhausted To: Alan Somers Cc: Michael Jung , John Baldwin , FreeBSD Current , owner-freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HNJNn3TKgz4bth X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N Just take it and change as you see fit, I don't have time to work on it. On 10/4/21, Alan Somers wrote: > On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: >> >> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: >> >> > On 2018-01-08 13:39, John Baldwin wrote: >> > >> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >> >> >>> Hi! >> >>> >> >>> I've recently up'd my processor count on our poudriere box and have >> >>> started noticing the error >> >>> "witness_lock_list_get: witness exhausted" on the console. The >> >>> kernel >> >>> *DOES NOT* crash but I >> >>> thought the report may be useful to someone. >> >>> >> >>> $ uname -a >> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun >> >>> Nov >> >>> 19 18:41:20 EST 2017 >> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >> >>> >> >>> The machine is pretty busy running four poudriere build instances. >> >>> >> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >> >>> >> >>> up 6+07:32:59 14:44:03 >> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >> >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% >> >>> idle >> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >> >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >> >>> >> >>> Let me know what additional information I might supply. >> >>> >> >> >> >> This just means that WITNESS stopped working because it ran out of >> >> pre-allocated objects. In particular the objects used to track how >> >> many locks are held by how many threads: >> >> >> >> /* >> >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> >> threads >> >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we >> >> should >> >> * probably be safe for the most part, but it's still a SWAG. >> >> */ >> >> #define LOCK_NCHILDREN 5 >> >> #define LOCK_CHILDCOUNT 2048 >> >> >> >> Probably the '2048' (max number of concurrent threads) needs to scale >> >> with >> >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> >> >> > >> > >> > Thank you for you explanation. We are expanding our ESXi cluster and >> > even >> > though with standard edition I can only assign 64 vCPU's to a guest and >> > as >> > much >> > RAM as I want, I do like to help with edge cases if I can make them >> > occur >> > pushing >> > boundaries as I can towards additianional improvements in FreeBSD. >> > >> >> Can you apply this and re-run the test? >> >> https://people.freebsd.org/~mjg/witness.diff >> >> It bumps the counters to be "high enough" but also starts tracking usage. >> If you get >> the message again, bump the values even higher. >> >> Once you get a complete poudriere run which did not result in the >> problem, >> do: >> $ sysctl debug.witness.list_used debug.witness.list_max_used >> >> to dump the actual usage. > > This is a nice little patch. Can we commit to head? Even better > would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, > here's what I get shortly after boot: > > debug.witness.list_max_used: 8432 > debug.witness.list_used: 8420 > > -Alan > -- Mateusz Guzik