From nobody Tue Jul 13 22:09:27 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id AFC438D464B for ; Tue, 13 Jul 2021 22:09:40 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: from mail-ej1-x635.google.com (mail-ej1-x635.google.com [IPv6:2a00:1450:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GPZYc01mgz3qCT for ; Tue, 13 Jul 2021 22:09:39 +0000 (UTC) (envelope-from zbeeble@gmail.com) Received: by mail-ej1-x635.google.com with SMTP id o5so44269130ejy.7 for ; Tue, 13 Jul 2021 15:09:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=c2xlZDC8sr7JctfZKhSnW/h8n/9Oi0KuJdu4AMHJChw=; b=EmYf1SPBy1NO8PE7oBmH8duAEpVMMzCXpDwkech8TexIGZMbnzLCq8ftU1Zv2xUWCI DGPIQnGIaNJ51qdsIBxoWoy3MzoX1GR9dAo3gl19ZLhpH4HuBYMFvofrLLG3OhRNImg7 jzSbF2S8kjOzV00yOassJkkAgB/lwGXHQ8UQ6IHom3ERwubVwfAF0YwRKcO73V3L44yI hsLNkZVjWiITkj1emFS50Tl4Bg/fBs2bStwcVfdCZR+XUMXnJYdpSxQpF+oU04Js5wbt QVLQEXK7dEytMbf6nxac3y1DuQVDZEp6NqondA8/YG68v8euqFArE5eDATptnRwNIgsQ 2dOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=c2xlZDC8sr7JctfZKhSnW/h8n/9Oi0KuJdu4AMHJChw=; b=Mb6oGsSjpKsKvoL9ebBO2Y/XmL1gZ/oxN7pwptDrZaT5EzZ1v0boqx9DVprIStCGjF /DnxURwrdw9H1h5CjE0TEnmP3kF8i6RQ/ryMnKqGBzSwtQp+55RTB/ZL8oXxMSfJJIJ2 WZIHLnacKYu2YA8GnMDzkCwg+bQjym6SyY/A3v3xWy7gC3KO06BDgLmoeETUuXLlJhuM FX8ugO+rSReEREHm50GWIsKCuTwSdyMGDdnUK5z1jvdpcp7I0Eox1+mXPtwzEOukC2aV kwRsxvfmC6aCL1uRJOo6K6AMtftCSCX3shcCtlycOb8C8AKA5tly7rlWzc/TAeRJGVwv TH3w== X-Gm-Message-State: AOAM533SXCXqAOLDC6TDRr+3C4Md8o7YLM8H2xp6d2ZLEkwn1gDRJbif yk5U3tMJocTanxJFBq5i6kEotIehHD1a5Dg/4Q== X-Google-Smtp-Source: ABdhPJyirvG26w836v4bcMWlbP1nyQXKFDpNyR/liSlUni0qjRiWbmiOF4Ziwspxo0QHr/ywdbA9yi0hH8jCnhcfcko= X-Received: by 2002:a17:906:a897:: with SMTP id ha23mr8020853ejb.164.1626214178567; Tue, 13 Jul 2021 15:09:38 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <13445948-7804-20b4-4ae6-aaac14d11e87@m5p.com> <20210708101907.0be3a3c2@rimwks.local> In-Reply-To: <20210708101907.0be3a3c2@rimwks.local> From: Zaphod Beeblebrox Date: Tue, 13 Jul 2021 18:09:27 -0400 Message-ID: Subject: Re: Periodic rant about SCHED_ULE To: Rozhuk Ivan Cc: George Mitchell , FreeBSD Hackers Content-Type: multipart/alternative; boundary="0000000000007139a305c7087e7f" X-Rspamd-Queue-Id: 4GPZYc01mgz3qCT X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=EmYf1SPB; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of zbeeble@gmail.com designates 2a00:1450:4864:20::635 as permitted sender) smtp.mailfrom=zbeeble@gmail.com X-Spamd-Result: default: False [-3.97 / 15.00]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::635:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-0.97)[-0.968]; TAGGED_RCPT(0.00)[freebsd]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::635:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::635:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-hackers] X-ThisMailContainsUnwantedMimeParts: Y --0000000000007139a305c7087e7f Content-Type: text/plain; charset="UTF-8" I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257160 regarding the following: SCHED_4BSD seems subject to a bit of rot at this point. To Wit, my 4 core riscv64 platform recently showed this top while doing a make -j4 of my own code. Note that each of the processes using more than 1000% CPU are single-threaded. PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 604 dgilbert 1 45 0 109M 66M CPU3 3 0:02 1039.89% c++ 605 dgilbert 1 45 0 109M 66M CPU1 1 0:02 1031.29% c++ 606 dgilbert 1 45 0 109M 66M RUN 2 0:02 1020.32% c++ 603 dgilbert 1 44 0 109M 66M CPU0 0 0:02 1011.41% c++ 854 root 1 40 0 17M 4764K select 1 3:04 0.17% tmux 425 root 1 40 0 14M 4040K CPU2 2 0:03 0.15% top As I said there, I don't believe that this is RISCV64 related --- it seems to me that the data that top is pulling is either incorrect or top is interpreting it incorrectly. The WCPU value seems to asymptotically approach 100%, but I'm not sure of that --- I can only watch it for so long. The same behaviour is seen if you launch (while true; do true; done) & in the background. But OTOH, if you are running SCHED_ULE, and you launch two of those while true's at nice -20 for each cpu ... then launch one at nice '0' ... you'll find that the nice 0 process fails to get 100% cpu. To my mind, this is a failure of the scheduler to read my intentions of nice -20. In fact, at times, the processor share of the un-nice process will fall below some of the nice processes for a few dozen samples at a time. Here is a top displaying that brokenness... PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 36410 root 1 89 0 14M 796K RUN 3 0:18 54.31% bash 36370 root 1 106 20 14M 800K RUN 1 0:58 49.86% bash 36372 root 1 105 20 14M 800K CPU1 1 0:56 49.69% bash 36375 root 1 106 20 14M 800K RUN 0 0:57 46.37% bash 36373 root 1 103 20 14M 800K RUN 3 0:56 44.94% bash 36371 root 1 105 20 14M 800K CPU0 0 0:57 43.51% bash 36376 root 1 105 20 14M 800K RUN 2 0:59 38.76% bash 36369 root 1 104 20 14M 920K CPU2 2 0:57 37.61% bash 36374 root 1 104 20 14M 800K RUN 2 0:57 32.66% bash TBH, I think SCHED_ULE is a failure and the only reason more people don't think so is that processors are now laregely too fast for people to care. Most people don't notice the scheduler because they almost never have more tasks than processor threads, so even really dumb schedulers would work out "OK" 98% of the time. I know we don't have guiding principles for nice, but I would toss out the +/- five rule for it --- that any process more than 5 nice levels lower from a cpu-busy process shouldn't preempt the higher process. I realize we have rtprio, but it's a pain to use. Anyways, don't let this last comment distract. On Thu, Jul 8, 2021 at 3:20 AM Rozhuk Ivan wrote: > On Wed, 7 Jul 2021 13:47:47 -0400 > George Mitchell wrote: > > > CPU: AMD Ryzen 5 2600X Six-Core Processor (3600.10-MHz K8-class CPU) > > (12 threads). > > > > FreeBSD 12.2-RELEASE-p7 r369865 GENERIC amd64 (SCHED_ULE) vs > > FreeBSD 12.2-RELEASE-p7 r369865 M5P amd64 (SCHED_4BSD). > > > > Comparing "make buildworld" time with misc/dnetc running vs not > > running. (misc/dnetc is your basic 100% compute-bound task, running > > at nice 20.) > > > > Three out of the four combinations build in roughly four hours, but > > SCHED_ULE with dnetc running takes close to twelve! (And that was > > overnight with basically nothing else running.) This is an even > > worse disparity than I have seen in previous releases. > > I do not use dnetc, but shed_ule on 2700 compile wold faster than 4 hours. > With ccache it takes ~10 minutes: world+kernel build and install and > update loaders. > > > # Make an SMP-capable kernel by default > options SMP #b Symmetric MultiProcessor Kernel > options NUMA #o Non-Uniform Memory Architecture > support > options EARLY_AP_STARTUP #o > > device cpufreq #m for non-ACPI CPU frequency > control > device cpuctl #m Provides access to MSRs, CPUID > info and microcode update feature. > > > # Kernel base > options SCHED_ULE #b 4BSD/ULE scheduler > options _KPOSIX_PRIORITY_SCHEDULING #b POSIX P1003_1B real-time > extensions > options PREEMPTION #b Enable kernel thread preemption > > > and sysctl tunings on desktop only: > # SCHEDULER > kern.sched.steal_thresh=1 # Minimum load on remote CPU > before we'll steal // workaround for freezes > kern.sched.balance=0 # Enables the long-term load > balancer > kern.sched.balance_interval=1000 # Average period in stathz ticks > to run the long-term balancer > kern.sched.affinity=10000 # Number of hz ticks to keep > thread affinity for > > > > --0000000000007139a305c7087e7f--