From nobody Tue Jul 13 22:22:05 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 2A2F98D7B72 for ; Tue, 13 Jul 2021 22:22:09 +0000 (UTC) (envelope-from ian@freebsd.org) Received: from outbound4s.ore.mailhop.org (outbound4s.ore.mailhop.org [54.185.97.28]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4GPZr05Y6Zz3tcC for ; Tue, 13 Jul 2021 22:22:08 +0000 (UTC) (envelope-from ian@freebsd.org) ARC-Seal: i=1; a=rsa-sha256; t=1626214927; cv=none; d=outbound.mailhop.org; s=arc-outbound20181012; b=YWS4Muc0o+3ooVRAZZnvzzuqKSdZeBgGiziKiDEZqwquHDGSd6LXt9HoTMZm83FQtn4qup6a7k8y7 K7xf+PhjvJSKUW7+josCZcLISThOhIAOtG3A6OwNb3Zt6YPswADre0enCFqg9X+vSSjeYhHtLqRgRB Ze77PFAVMUcelzV4o2jxSF0TBj13wvYwGEUQfPLug6+uUBVGNdx7LYsdpBD8DdeO7Tu6y4XyMb4rRq JPZ6+Hu9wH3HmOHaAe4E484KPRUhTdTPRm4aqwQ7ODSr8YR+3bAtRgzwuJtUFtpxlYOMJn6S186jP5 gIU9IKdsMuovi6nkqseKUBnTN889fow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=arc-outbound20181012; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:dkim-signature:from; bh=zdlcuqJ5dIhWp6vcQJwW34czvzaiWII+zfy1umGh2FQ=; b=P652omG8p2z/PcijkyiakLp2kjpdOha78kltRsD0+Myq3c1EslMueX7IUDjloEaDhHVKrGA75CbVv n8mMU0e5daYGug839Sh5kRFcRpOKtHroXSYks+/P5I4oUi8V3KN/dXjNm3iChgbIX+Dv534dlAgl0l 47EJ8WdgCWxNeDL0kTbJmVS9INm1WDzaS8KKW09GEKuGiHvJ88jEBnnuKwuOePsopYoUp9X+ZBtIUr 5+TBZ9EsUyLcUt6615jXWef8Q9LJ35xuknDtvTXO8TYZNw7DLDN8NEtpt9JYeYWr3lnhU19/uVFIgb ARMsZWwHOISA/o8Ahb0BqhIIsm9GqLQ== ARC-Authentication-Results: i=1; outbound4.ore.mailhop.org; spf=softfail smtp.mailfrom=freebsd.org smtp.remote-ip=67.177.211.60; dmarc=none header.from=freebsd.org; arc=none header.oldest-pass=0; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outbound.mailhop.org; s=dkim-high; h=content-transfer-encoding:mime-version:content-type:references:in-reply-to: date:cc:to:from:subject:message-id:from; bh=zdlcuqJ5dIhWp6vcQJwW34czvzaiWII+zfy1umGh2FQ=; b=YM5HbgcdrssdG3dNKIyd49QtP9z+nDh6GsgDQ8KYAYkZaiBWFVKYG1IaysG2u14JQjN6I13O4ZGVY C+FRG/YYfSPozBo5pMpa74eCLArrMtNxZSi5NXMK+bX9QP9y+0EGqfcu+VX7w1CTvQv6srFmBB3kn7 RT7fsgR9O1FwvG6pK0yC6rsQHNKNaOUlGtatBAu4r1XWdPWXqWIH4gKcu4iyGR3eMrI49xFs1H4sn2 yN3c5hUALUVdmnva9hNZI4K6AhDBVOoVk8gN2LE/G0DsgZG9cwAiyznED1nZ7H1s/UE+nrRaYnn2Fy ZLvlRsNECFIsm/e7NS4/a7jH7LaUwcg== X-Originating-IP: 67.177.211.60 X-MHO-RoutePath: aGlwcGll X-MHO-User: c1252814-e428-11eb-a657-89389772cfc7 X-Report-Abuse-To: https://support.duocircle.com/support/solutions/articles/5000540958-duocircle-standard-smtp-abuse-information X-Mail-Handler: DuoCircle Outbound SMTP Received: from ilsoft.org (c-67-177-211-60.hsd1.co.comcast.net [67.177.211.60]) by outbound4.ore.mailhop.org (Halon) with ESMTPSA id c1252814-e428-11eb-a657-89389772cfc7; Tue, 13 Jul 2021 22:22:06 +0000 (UTC) Received: from rev (rev [172.22.42.240]) by ilsoft.org (8.15.2/8.15.2) with ESMTP id 16DMM5Zv092232; Tue, 13 Jul 2021 16:22:05 -0600 (MDT) (envelope-from ian@freebsd.org) Message-ID: Subject: Re: Periodic rant about SCHED_ULE From: Ian Lepore To: Zaphod Beeblebrox Cc: FreeBSD Hackers Date: Tue, 13 Jul 2021 16:22:05 -0600 In-Reply-To: References: <13445948-7804-20b4-4ae6-aaac14d11e87@m5p.com> <20210708101907.0be3a3c2@rimwks.local> Content-Type: text/plain; charset="ASCII" X-Mailer: Evolution 3.28.5 FreeBSD GNOME Team List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4GPZr05Y6Zz3tcC X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, 2021-07-13 at 18:09 -0400, Zaphod Beeblebrox wrote: > I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257160 > regarding > the following: > > SCHED_4BSD seems subject to a bit of rot at this point. To Wit, my 4 > core > riscv64 platform recently showed this top while doing a make -j4 of > my own > code. Note that each of the processes using more than 1000% CPU are > single-threaded. > > PID USERNAME THR PRI NICE SIZE RES > STATE C TIME WCPU COMMAND > 604 dgilbert 1 45 0 109M 66M CPU3 3 0:02 > 1039.89% c++ > 605 dgilbert 1 45 0 109M 66M CPU1 1 0:02 > 1031.29% c++ > 606 dgilbert 1 45 0 109M 66M RUN 2 0:02 > 1020.32% c++ > 603 dgilbert 1 44 0 109M 66M CPU0 0 0:02 > 1011.41% c++ > 854 root 1 40 0 17M 4764K > select 1 3:04 0.17% tmux > 425 root 1 40 0 14M 4040K > CPU2 2 0:03 0.15% top > > As I said there, I don't believe that this is RISCV64 related --- it > seems > to me that the data that top is pulling is either incorrect or top is > interpreting it incorrectly. The WCPU value seems to asymptotically > approach 100%, but I'm not sure of that --- I can only watch it for > so > long. The same behaviour is seen if you launch (while true; do true; > done) > & in the background. > > But OTOH, if you are running SCHED_ULE, and you launch two of those > while > true's at nice -20 for each cpu ... then launch one at nice '0' ... > you'll > find that the nice 0 process fails to get 100% cpu. To my mind, this > is a > failure of the scheduler to read my intentions of nice -20. In fact, > at > times, the processor share of the un-nice process will fall below > some of > the nice processes for a few dozen samples at a time. Here is a top > displaying that brokenness... > > PID USERNAME THR PRI NICE SIZE RES > STATE C TIME WCPU COMMAND > 36410 root 1 89 0 14M 796K > RUN 3 0:18 54.31% bash > 36370 root 1 106 20 14M 800K > RUN 1 0:58 49.86% bash > 36372 root 1 105 20 14M 800K > CPU1 1 0:56 49.69% bash > 36375 root 1 106 20 14M 800K > RUN 0 0:57 46.37% bash > 36373 root 1 103 20 14M 800K > RUN 3 0:56 44.94% bash > 36371 root 1 105 20 14M 800K > CPU0 0 0:57 43.51% bash > 36376 root 1 105 20 14M 800K > RUN 2 0:59 38.76% bash > 36369 root 1 104 20 14M 920K > CPU2 2 0:57 37.61% bash > 36374 root 1 104 20 14M 800K > RUN 2 0:57 32.66% bash > > TBH, I think SCHED_ULE is a failure and the only reason more people > don't > think so is that processors are now laregely too fast for people to > care. > Most people don't notice the scheduler because they almost never have > more > tasks than processor threads, so even really dumb schedulers would > work out > "OK" 98% of the time. > > I know we don't have guiding principles for nice, but I would toss > out the > +/- five rule for it --- that any process more than 5 nice levels > lower > from a cpu-busy process shouldn't preempt the higher process. I > realize we > have rtprio, but it's a pain to use. Anyways, don't let this last > comment > distract. > > > > On Thu, Jul 8, 2021 at 3:20 AM Rozhuk Ivan > wrote: > > > On Wed, 7 Jul 2021 13:47:47 -0400 > > George Mitchell wrote: > > > > > CPU: AMD Ryzen 5 2600X Six-Core Processor (3600.10-MHz K8-class > > > CPU) > > > (12 threads). > > > > > > FreeBSD 12.2-RELEASE-p7 r369865 GENERIC amd64 (SCHED_ULE) vs > > > FreeBSD 12.2-RELEASE-p7 r369865 M5P amd64 (SCHED_4BSD). > > > > > > Comparing "make buildworld" time with misc/dnetc running vs not > > > running. (misc/dnetc is your basic 100% compute-bound task, > > > running > > > at nice 20.) > > > > > > Three out of the four combinations build in roughly four hours, > > > but > > > SCHED_ULE with dnetc running takes close to twelve! (And that > > > was > > > overnight with basically nothing else running.) This is an even > > > worse disparity than I have seen in previous releases. > > > > I do not use dnetc, but shed_ule on 2700 compile wold faster than 4 > > hours. > > With ccache it takes ~10 minutes: world+kernel build and install > > and > > update loaders. > > > > > > # Make an SMP-capable kernel by default > > options SMP #b Symmetric MultiProcessor > > Kernel > > options NUMA #o Non-Uniform Memory > > Architecture > > support > > options EARLY_AP_STARTUP #o > > > > device cpufreq #m for non-ACPI CPU > > frequency > > control > > device cpuctl #m Provides access to MSRs, > > CPUID > > info and microcode update feature. > > > > > > # Kernel base > > options SCHED_ULE #b 4BSD/ULE scheduler > > options _KPOSIX_PRIORITY_SCHEDULING #b POSIX P1003_1B real- > > time > > extensions > > options PREEMPTION #b Enable kernel thread > > preemption > > > > > > and sysctl tunings on desktop only: > > # SCHEDULER > > kern.sched.steal_thresh=1 # Minimum load on remote > > CPU > > before we'll steal // workaround for freezes > > kern.sched.balance=0 # Enables the long-term > > load > > balancer > > kern.sched.balance_interval=1000 # Average period in stathz > > ticks > > to run the long-term balancer > > kern.sched.affinity=10000 # Number of hz ticks to > > keep > > thread affinity for > > > > > > > > top has been showing bad values for CPU% with SCHED_BSD for many years, on all architectures. I remember Bruce Evans once commenting that it had something to do with changes to clock handling in the kernel (maybe related to when eventtimers first came in, but I might be misrembering that detail). If you ask top to display straight cpu instead of wcpu the results are much more sane. I too wish that nice made a bigger difference, but that problem isn't limited to SCHED_ULE, nice is little more than a vague hint even when using SCHED_BSD. I eventually concluded that there's just no way to run a compute-heavy workload (such as buildworld -j) using nice and keep the machine responsive enough for interactive use. I switched to running builds with idprio, which isn't really onerous if you set sysctl security.bsd.unprivileged_idprio=1 in /etc/sysctl.conf. -- Ian