Re: Periodic rant about SCHED_ULE

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 23 Mar 2023 04:09:22 UTC
[I added a -j32 buildworld buildkernel with SCHED_4BSD
and dnetc-in-use comparison, to the other ThreadRipper
1950X examples. SCHED_4BSD does take notably less time
than SCHED_ULE when dnetc is also active: still a good
match to the simple round-robin for this building
activity. I will note that the 1950X UEFI/firmware is
not configured present itself as NUMA but the FreeBSD
kernels in use are NUMA capable as built.]

On Mar 22, 2023, at 19:44, Mark Millard <marklmi@yahoo.com> wrote:

> On Mar 22, 2023, at 18:08, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Mar 22, 2023, at 18:03, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> On Mar 22, 2023, at 16:17, Mark Millard <marklmi@yahoo.com> wrote:
>>> 
>>>> On Mar 22, 2023, at 15:39, Mark Millard <marklmi@yahoo.com> wrote:
>>>> 
>>>>> On Mar 22, 2023, at 13:34, Mark Millard <marklmi@yahoo.com> wrote:
>>>>> 
>>>>>> On Mar 22, 2023, at 12:40, George Mitchell <george+freebsd@m5p.com> wrote:
>>>>>> 
>>>>>>> On 3/22/23 15:21, Mark Millard wrote:
>>>>>>>> George Mitchell <george+freebsd@m5p.com> wrote on
>>>>>>>> Date: Wed, 22 Mar 2023 17:36:39 UTC :
>>>>>>>> [...]
>>>>>>>>> Here are the very complicated instructions for reproducing the problem:
>>>>>>>>> 1. Install and start misc/dnetc from ports.
>>>>>>>> Installing is likely easy, as likely would be building
>>>>>>>> with default options (if any). I know nothing about
>>>>>>>> starting misc/dnetc so that is research. (Possibly
>>>>>>>> trivial, although if it has alternatives to control
>>>>>>>> then I'd need to match that context too.)
>>>>>>> 
>>>>>>> service dnetc start
>>>>>> 
>>>>>> I built and installed misc/dnetc and got a binary
>>>>>> blob that clearly was not built in my environment:
>>>>>> 
>>>>>> # file /usr/local/distributed.net/dnetc
>>>>>> /usr/local/distributed.net/dnetc: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), statically linked, for FreeBSD 10.1 (1001515), FreeBSD-style, stripped
>>>>>> 
>>>>>> Way older FreeBSD vintage than the locally available toolchains
>>>>>> would normally build. Some might be cautious about such a thing.
>>>>>> 
>>>>>> The man page reported that:
>>>>>> 
>>>>>> QUOTE
>>>>>> If you have never run the client before, it will initiate the menu-driven
>>>>>> configuration. Save and quit when done, the configuration file will be
>>>>>> saved in the same directory as the client. Now, simply restart the
>>>>>> client. From that point on it will use the saved configuration.
>>>>>> END QUOTE
>>>>>> 
>>>>>> I've not seen what the configuration asks about yet.
>>>>> 
>>>>> I went through the configuration, basically just looking
>>>>> at it, other than providing an E-mail address. Then . . .
>>>>> 
>>>>> $ sudo service dnetc start
>>>>> Password:
>>>>> Cannot 'start' dnetc. Set dnetc_enable to YES in /etc/rc.conf or use 'onestart' instead of 'start'.
>>>>> 
>>>>> $ sudo service dnetc onestart
>>>>> 
>>>>> I just let it run without any extra competing activity, other
>>>>> than I had my patched version of top running. It records and
>>>>> reports various maximum-observed (MaxObs) figures, here
>>>>> the load averages being relevant.
>>>>> 
>>>>> Top showed that dnetc started 32 processes, one per hardware
>>>>> thread. Mostly I saw: 100% nice and 0% idle.
>>>>> 
>>>>> Letting it run and then looking at the load averages (and
>>>>> their matching MaxObs figures) after something like 60+ min
>>>>> (not carefully timed: was doing other things) showed:
>>>>> 
>>>>> load averages:  31.97,  31.88,  31.66 MaxObs:  32.12,  31.97,  31.66
>>>>> 
>>>>> (Note: The machine had been up for over 2.75 days before
>>>>> starting this and had not been building much of anything
>>>>> during that time.)
>>>>> 
>>>>> I've not yet experimented with having other, significant
>>>>> competing activity.
>>>>> 
>>>>>>>>> 2. Run "make buildworld".
>>>>>>>> So on the 32 hardware-thread (16 cores) amd64 machine that
>>>>>>>> I have access to, the test is to only have buildworld use
>>>>>>>> about one hardware thread, no matter what else is going on.
>>>>>>>> I never would have guessed that the steps would not involve
>>>>>>>> more like -j$(sysctl -n hw.ncpu) (so around -j32 in this
>>>>>>>> context). So it is good that you provided your note or
>>>>>>>> I'd not know if I'd done similarly or not when trying such.
>>>>>>>> [Note: -j1 and lack of -j are not strictly equivalent in
>>>>>>>> how make operates. As I remember, the distinction makes
>>>>>>>> a notable difference in the number of subprocesses created
>>>>>>>> directly by make (one per action "line" vs. one for the
>>>>>>>> whole block?). So even using -j1 might make a difference
>>>>>>>> vs. what you specified. I'd have to test to see.]
>>>>>>> 
>>>>>>> I am literally running "make buildworld" with no additional options.
>>>>>> 
>>>>>> So required for repeating your results, but likely making
>>>>>> such results not be interesting relative to how I normally
>>>>>> deal with buildworld buildkernel and the likel, no matter
>>>>>> if there is other activity in an overlapping time frame or
>>>>>> not: my time preferences are too strong to wait for a single
>>>>>> hardware thread to do my normal builds, even with no
>>>>>> competing activity on the builder.
>>>>>> 
>>>>>>>>> Standard out conveniently reports how long it took (wall clock).
>>>>>>>> But nothing in your instructions indicate about how
>>>>>>>> to get an idea much progress dnetc made during the
>>>>>>>> various tests? [...]
>>>>>>> 
>>>>>>> Honestly, I've never worried about this part.  But dnetc logs its
>>>>>>> progress in /usr/local/distributed.net/dnetc.txt, though not in terms
>>>>>>> that are easy to relate to real-world progress.  Oddly, when I run
>>>>>>> "make buildworld," I'm primarily interested in getting the world built.
>>>>>>> Perhaps others feel differently.
>>>>>> 
>>>>>> Off topic for the specifics of the actual benchmark
>>>>>> that you run:
>>>>>> 
>>>>>> Then why not use of -jN ? In my context, any buildworld
>>>>>> using -j1 or no -j at all takes a huge amount of time
>>>>>> longer than letting it use all the hardware threads (or
>>>>>> so). (I've avoided having any I/O bound contexts for
>>>>>> such.) It does not take additional load on the system
>>>>>> for that to be true --including on the 4-core small arm
>>>>>> boards when I happen to buildworld on such (rare).
>>>>>> 
>>>>>> 
>>>>>>>> [...]
>>>>>>>> FYI: I've never built with and run the alternate
>>>>>>>> scheduler so if there is any appropriate background
>>>>>>>> for that that would not be obvious on finding basic
>>>>>>>> instructions, it would be appropriate to provide
>>>>>>>> such notes.
>>>>>>>> [...]
>>>>>>> 
>>>>>>> You have to build a new kernel, using a config file in which you have
>>>>>>> replaced "options SCHED_ULE" with "options SCHED_4BSD".     -- George
>>>>>> 
>>>>>> Thanks for the notes.
>>>>>> 
>>>>>> I've not decided if I'll do anything with the binary
>>>>>> blob or not.
>>>>> 
>>>> 
>>>> FYI:
>>>> 
>>>> It is not your specific experiment, but I started my
>>>> "extra load" experimenst with . . .
>>>> 
>>>> I started a -j32 buildworld buildkernel with dnetc still
>>>> running. I'm generally seeing around 55% Active and 42%
>>> 
>>> Note "Active": user, sorry.
>>> 
>>>> nice, < 2% system (it was building libllvm at this point).
>>>> At that time:
>>>> 
>>>> load averages:  64.41,  60.52,  49.81 MaxObs:  64.47,  60.52,  49.81
>>>> 
>>> 
>>> Contrasting results for some obj-lib32 build activity:
>>> much more variety of User, nice, and system, including
>>> times with < 5% user, 90+% nice. But not typical overall.
>>> But lots of time roughly around 50%/50% or 35%/60%. There
>>> were times with 15+% system.
>>> 
>>> Somewhat after buildkernel started:
>>> 
>>> load averages:  69.15,  64.12,  58.72 MaxObs:  75.98,  64.12,  58.72
>>> 
>>> Harder to summarize, so overall timing reports from the
>>> buildworld and buildkernel stages.
>>> 
>>> 
>>> buildworld:
>>> 
>>> --------------------------------------------------------------
>>> ... World build completed on Wed Mar 22 16:37:57 PDT 2023
>>> ... World built in 2615 seconds, ncpu: 32, make -j32
>>> --------------------------------------------------------------
>>> 
>>> 
>>> buildkernel:
>>> 
>>> --------------------------------------------------------------
>>> ... Kernel build for GENERIC-NODBG completed on Wed Mar 22 16:43:10 PDT 2023
>>> --------------------------------------------------------------
>>> ... Kernel(s)  GENERIC-NODBG built in 311 seconds, ncpu: 32, make -j32
>>> --------------------------------------------------------------
>>> 
>>> Afterwards:
>>> 
>>> load averages:  36.08,  53.14,  55.79 MaxObs:  75.98,  65.77,  59.84
>>> 
>>> 
>>> I then did (not all in the same window):
>>> 
>>> $ sudo service dnetc onestop
>>> # rm -fr /usr/obj/BUILDs/main-amd64-nodbg-clang-alt/usr/
>>> 
>>> before another -j32 buildworld buildkernel (no dnetc). The
>>> reuslts for this were:
>>> 
>>> 
>>> buildworld:
>>> 
>>> --------------------------------------------------------------
>>> ... World build completed on Wed Mar 22 17:39:19 PDT 2023
>>> ... World built in 1240 seconds, ncpu: 32, make -j32
>>> --------------------------------------------------------------
>>> 
>>> (compared to the 2615 for dnetc also in use)
>>> 
>>> 
>>> buildkernel:
>>> 
>>> --------------------------------------------------------------
>>> ... Kernel build for GENERIC-NODBG completed on Wed Mar 22 17:41:17 PDT 2023
>>> --------------------------------------------------------------
>>> ... Kernel(s)  GENERIC-NODBG built in 118 seconds, ncpu: 32, make -j32
>>> --------------------------------------------------------------
>>> 
>>> (compared to the 311 for dnetc also in use)
>> 
>> I forgot to show the MaxObs load averages for the no-dnetc
>> context:
>> 
>> MaxObs:  39.77,  32.15,  25.75
>> 
>>> Experiments without -j32 will take a lot longer, even
>>> without dnetc in use. I'm not sure there will be such
>>> results today.
>>> 
>> 
> 
> I decided to do some more of the less time consuming
> testing. SCHED_4BSD, no dnetc, -j32 buildworld buildkernel :
> 
> 
> buildworld:
> 
> --------------------------------------------------------------
> ... World build completed on Wed Mar 22 19:16:35 PDT 2023
> ... World built in 1235 seconds, ncpu: 32, make -j32
> --------------------------------------------------------------
> 
> (compared to 1240 for SCHED_ULE)
> 
> So: no significant difference.
> 
> 
> buildkernel (SCHED_4BSD building a SCHED_4BSD):
> 
> --------------------------------------------------------------
> ... Kernel build for GENERIC-NODBG-SCHED_4BSD completed on Wed Mar 22 19:18:34 PDT 2023
> --------------------------------------------------------------
> ... Kernel(s)  GENERIC-NODBG-SCHED_4BSD built in 119 seconds, ncpu: 32, make -j32
> --------------------------------------------------------------
> 
> (compared to 118 for SCHED_ULE building a SCHED_ULE)
> 
> So: no significant difference.

I again forgot to show MaxObs load averages (for the above):

MaxObs:  39.23,  31.58,  24.30

> I'll try it with dnetc also active.
> 

I still have no good indication of dnetc progress to allow
comparison of the combination. So the below focuses on
buildworld buildkernel . I expect that the comparative
results suggest a buildworld/buildkernel vs. dnetc progress
tradeoff, not that I can well quantify it.

The below are with dnetc also active.

load averages, MaxObs:  73.03,  65.48,  56.30
(I remembered this time!)


buildworld:

--------------------------------------------------------------
... World build completed on Wed Mar 22 20:15:56 PDT 2023
... World built in 1667 seconds, ncpu: 32, make -j32
--------------------------------------------------------------

(compared to 2615 for SCHED_ULE with dnetc
 and to 1240 or so for no dnetc)


buildkernel:

--------------------------------------------------------------
... Kernel build for GENERIC-NODBG-SCHED_4BSD completed on Wed Mar 22 20:18:34 PDT 2023
--------------------------------------------------------------
... Kernel(s)  GENERIC-NODBG-SCHED_4BSD built in 158 seconds, ncpu: 32, make -j32
--------------------------------------------------------------

(compared to 311 for SCHED_ULE with dnetc
 and to 118 or so for no dnetc)


With dnetc active, it does not take being near -j1
(or no -j) for buildworld buildkernel to take noticably
less time: -j32 (the number of hardware threads, 16
cores) also takes noticeably less time. buildworld
buildkernel in this context seems to be a good match to
SCHED_4BSD and its round-robin.

(I make no general claim to SCHED_4BSD being better
across a large range of contexts.)

I've not decided if I'll try anything like a -j1
or no -j alternative.

Without dnetc active, SCHED_ULE and SCHED_4BSD did not
make much of a distinction. For how I use the builder
machines, the scheduler choice is not suggested to be
significant for my system-build activities.


I've not tested port building in poudriere-devel for
how I configure such. But nothing suggests to me to
expect a significant distinction between the 2
schedulers for my way of working for building packages
from ports.


===
Mark Millard
marklmi at yahoo.com