From nobody Tue Apr 04 19:24:42 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Prd4Y3vK3z443k4 for ; Tue, 4 Apr 2023 19:24:45 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Prd4X1Ylqz4G7C for ; Tue, 4 Apr 2023 19:24:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=D60mKLSX; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2607:f8b0:4864:20::332 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ot1-x332.google.com with SMTP id q23-20020a05683031b700b006a1370e214aso15040380ots.11 for ; Tue, 04 Apr 2023 12:24:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680636283; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vZYmiIGjFU9D4OWMBFQlzWiu/kVg8cmtPHvKdME/lv8=; b=D60mKLSXpepKbhjPDZXs7izfJ7XuTay6s8kXPfaYXphK0n/S39gEZRoZUBB0e0wWOb bzTZt/LeIT1J/HTtqWkAGj8bFhi9OQl9Na6CbM+ZPcphPA0HxbmADXSgMvFXDRQaYAGr HA//6syL/kqdQ7x0d4gE3HXKDFfSBvt4o+JxPeYWBAt40HMx9G94H5GdbT5SSkBu0F7l KYuleVk1lj1wWKnUmwwhQpDmtvqrBKcXSEd+/0bx3gpspkIFQ3q33yWPF07MLhl6z/K1 /IzhdUTR42a03diBZJ4kvh5hQeTDTakVyDyfRAd9NG70pm1S26o6pGrDvcZjF5r321Ir X4xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680636283; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vZYmiIGjFU9D4OWMBFQlzWiu/kVg8cmtPHvKdME/lv8=; b=myHWfe4bphr928oeMi8N5YnMsLLhzbpIshXLfr2ybtdXxm/nAG/bqPcSgW0mv1Frc9 XTFk2qViR7ZIUYZQbLznHiiH2twzu653XenWitTiniLIeib27CZUVN3xmMYg807DVUdc t4I6O+7xS7BtfVPbho0hYeETOehuuQeJSNo6CckQJb4RyLTZTs6y3JSWyH9F8Nly5rlt NFjQFPGw8TzDlH6CZDid9m3OjjkOD3q/vqt2hN1PWYumlI44K+AAIZvyyMP3p/MTUFfP 3vnvp5XgNq+jmZkve/r0/YR981jX0Xo4MgzhWPrbhUNU3BdqoCdQi0KeCZROvUVnO5Kn rhJw== X-Gm-Message-State: AAQBX9cCvm2gaMsNL+9IHFWlMd5u3kdjORxAT0Hoe8CRaqlxEXHC9eJF pk5+zxsOf7taAI9pjmugcjgU6FtQympAMJkrRp8dVlGB X-Google-Smtp-Source: AKy350Ya3LtZEPHkXTim4kakEwyXHpL0hofpnPWRBmvLsF7BKkdFLJEjsk06+bhHpM+ptTRa3Ct0CWMIYqC+Rt2kCjY= X-Received: by 2002:a9d:7456:0:b0:6a3:8d70:b404 with SMTP id p22-20020a9d7456000000b006a38d70b404mr1187735otk.2.1680636283015; Tue, 04 Apr 2023 12:24:43 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Received: by 2002:a8a:791:0:b0:49c:b071:b1e3 with HTTP; Tue, 4 Apr 2023 12:24:42 -0700 (PDT) In-Reply-To: References: From: Mateusz Guzik Date: Tue, 4 Apr 2023 21:24:42 +0200 Message-ID: Subject: Re: ULE process to resolution To: Jeff Roberson Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-3.00 / 15.00]; URI_HIDDEN_PATH(1.00)[https://people.freebsd.org/~mjg/.junk/cpuburner1.c]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::332:from]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org] X-Rspamd-Queue-Id: 4Prd4X1Ylqz4G7C X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N Hello, On 3/31/23, Jeff Roberson wrote: > As I read these threads I can state with a high degree of confidence that > many of these tests worked with superior results with ULE at one time. > It may be that tradeoffs have changed or exposed weaknesses, it may also > be that it's simply been broken over time. I see a large number of > commits intended to address point issues and wonder whether we adequately > explored the consquences. Indeed I see solutions involving tunables > proposed here that will definitively break other cases. > One of the reporters claims the bug they complain about was there since early days. This made me curious how many problems reproduce on something like 7.1 (dated 2009), to that end I created an 8 core vm which I ran of bunch of tests on in addition to main. All 3 problems reported below reproduced there, no X testing though :) Bugs (one not reported in the other thread): 1. threads walking around the machine when spending little time off cpu, all while the machine is otherwise idle The problem with this on bare metal is that the victim cpu may be partially powered off, so now there is latency stemming from poking it back up, whatever other migration cost aside. I noticed this few years back when looking at postgres -- both the server and pgbench would walk around everywhere, reducing perf. I checked this reproduces on fresh main. The box at hand as 2 sockets * 10 cores * 2 threads. I *suspect* this is adequately modeled with a microbenchmark https://github.com/antonblanchard/will-it-scale/ named context_switch1_processes -- it too experiences all-machine walk unless explicitly bound (pass -n to *not* bind it). I verified they walk all around on 7.1 as well, but I don't know if postgres also would. how to bench: su - postgres /usr/local/bin/pg_ctl -D /var/db/postgres/data15 -l logfile start pgbench -i -s 10 pgbench -M prepared -S -T 800000 -c 1 -j 1 -P1 postgres ... and you are in. 2. unfairness when oversubscribing with cpu hogs Steve Kargl claims he reported this one numerous times since the early days of ULE, I confirmed it was a problem on 7.1 and is a problem today. Say an 8 core vm (with making sure these are cores pinned on the host) I'm going to copy paste my other message here: I wrote a cpu burning program (memset 1 MB in a loop, with enough iterations to take ~20 seconds on its own). I booted an 8 core bhyve vm, where I made sure to cpuset is to 8 distinct cores. The test runs *9* workers, here is a sample run: [copy] 4bsd: 23.18 real 20.81 user 0.00 sys 23.26 real 20.81 user 0.00 sys 23.30 real 20.81 user 0.00 sys 23.34 real 20.82 user 0.00 sys 23.41 real 20.81 user 0.00 sys 23.41 real 20.80 user 0.00 sys 23.42 real 20.80 user 0.00 sys 23.53 real 20.81 user 0.00 sys 23.60 real 20.80 user 0.00 sys 187.31s user 0.02s system 793% cpu 23.606 total ule: 20.67 real 20.04 user 0.00 sys 20.97 real 20.00 user 0.00 sys 21.45 real 20.29 user 0.00 sys 21.51 real 20.22 user 0.00 sys 22.77 real 20.04 user 0.00 sys 22.78 real 20.26 user 0.00 sys 23.42 real 20.04 user 0.00 sys 24.07 real 20.30 user 0.00 sys 24.46 real 20.16 user 0.00 sys 181.41s user 0.07s system 741% cpu 24.465 total [/paste] While ule spends fewer *cycles*, it spends more real time and it is *probably* bad. you can repro with: https://people.freebsd.org/~mjg/.junk/cpuburner1.c cc -O0 -o cpuburner1 cpuburner1.c and a magic script: #!/bin/sh ins=$1 shift while [ $ins -ne 0 ]; do time ./cpuburner1 $1 $2 & ins=$((ins-1)) done wait run like this, pick the second number to take 20-ish seconds on your cpu: sh burn.sh 1048576 500000 3. threads struggling to get back on cpu against nice -n 20 higs This acutely affects buildkernel. I once more played around, the bug was already there in 7.1, extending total time from ~4 minutes to 30. The problem is introduced with the machinery to attempt to provide fairness for pri <= PRI_MAX_BATCH. I verified that with straight up removing all of it. Then buildikernel managed to finish in sensible time, but the cpu hogs were overly negatively affected -- little cpu time and very unfairly distributed between them. Key point though that this *can* stick to close to base time. I had seen the patch from https://reviews.freebsd.org/D15985 , it does not fix the problem but it does alleviate it to some extent. It is weirdly hacky and seems to be targeting just the testcase you had instead of the more general problem. I applied it to a 2018-ish tree so that there are no woes from rebasing. stock: 290.95 real 2048.22 user 247.967 sys stock+hogs: 883.81 real 2111.34 user 189.42 sys patched+hogs: 460.84 real 2055.63 user 232.00 sys Interestingly stock kernel from that period is less affected by the general problem, but it is still pretty bad. With the patch things improve markedly, but there is still ~50% increase in real time which is way too much for being paired against -n 20. https://people.freebsd.org/~mjg/.junk/cpuburner2.c magic script: #!/bin/sh workers=$1 n=$2 size=$3 bkw=$4 echo workers $workers nice $n buildkernel $bkw shift while [ $workers -ne 0 ]; do time nice -n $n ./cpuburner $size & workers=$((workers-1)) done time make -C /usr/src -ssss -j $bkw buildkernel > /dev/null # XXX webdev-style pkill cpuburner wait sample use: time sh burn+bk.sh 8 20 1048576 8 I figured there would be a regression test suite available, with tests checking what happens for known cases with possibly contradictory requirements. Got nothing, instead I found people use hackbench (:S) or just a workload. All that said, I'm buggering off the subject. My interest in it was limited to the nice problem, since I have pretty good reasons to suspect this is what is causing pathological total real time instances for package builds. Have fun, -- Mateusz Guzik