Re: Periodic rant about SCHED_ULE

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 24 Mar 2023 19:47:08 UTC
Steve Kargl <sgk_at_troutmask.apl.washington.edu> wrote on
Date: Wed, 22 Mar 2023 19:04:06 UTC :

> On Wed, Mar 22, 2023 at 07:31:57PM +0100, Matthias Andree wrote:
> > 
> > Yes, there are reports that FreeBSD is not responsive by default - but this
> > may make it get overall better throughput at the expense of responsiveness,
> > because it might be doing fewer context switches. So just complaining about
> > a longer buildworld without seeing how much dnetc did in the same wallclock
> > time period is useless. Periodic rant's don't fix this lack of information.
> > 
> 
> I reported the issue with ULE some 15 to 20 years ago.
> I gave up reporting the issue. The individuals with the
> requisite skills to hack on ULE did not; and yes, I lack
> those skills. The path of least resistance is to use
> 4BSD.
> 
> % cat a.f90
> !
> ! Silly numerically intensive computation.
> !
> program foo
> implicit none
> integer, parameter :: m = 200, n = 1000, dp = kind(1.d0)
> integer i
> real(dp) x
> real(dp), allocatable :: a(:,:), b(:,:), c(:,:)
> call random_init(.true., .true.)
> allocate(a(n,n), b(n,n))
> do i = 1, m
> call random_number(a)
> call random_number(b)
> c = matmul(a,b)
> x = sum(c)
> if (x < 0) stop 'Whoops'
> end do
> end program foo
> % gfortran11 -o z -O3 -march=native a.f90 
> % time ./z
> 42.16 real 42.04 user 0.09 sys
> % cat foo
> #! /bin/csh
> #
> # Launch NCPU+1 images with a 1 second delay
> #
> foreach i (1 2 3 4 5 6 7 8 9)
> ./z &
> sleep 1
> end
> % ./foo
> 
> In another xterm, you can watch the 9 images.
> 
> % top
> st pid: 1709; load averages: 4.90, 1.61, 0.79 up 0+00:56:46 11:43:01
> 74 processes: 10 running, 64 sleeping
> CPU: 99.9% user, 0.0% nice, 0.1% system, 0.0% interrupt, 0.0% idle
> Mem: 369M Active, 187M Inact, 240K Laundry, 889M Wired, 546M Buf, 14G Free
> Swap: 16G Total, 16G Free
> 
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
> 1699 kargl 1 56 0 68M 35M RUN 3 0:41 92.60% z
> 1701 kargl 1 56 0 68M 35M RUN 0 0:41 92.33% z
> 1689 kargl 1 56 0 68M 35M CPU5 5 0:47 91.63% z
> 1691 kargl 1 56 0 68M 35M CPU0 0 0:45 89.91% z
> 1695 kargl 1 56 0 68M 35M CPU2 2 0:43 88.56% z
> 1697 kargl 1 56 0 68M 35M CPU6 6 0:42 88.48% z
> 1705 kargl 1 55 0 68M 35M CPU1 1 0:39 88.12% z
> 1703 kargl 1 56 0 68M 35M CPU4 4 0:39 87.86% z
> 1693 kargl 1 56 0 68M 35M CPU7 7 0:45 78.12% z
> 
> With 4BSD, you see the ./z's with 80% or greater CPU. All the ./z's exit
> after 55-ish seconds. If you try this experiment on ULE, you'll get NCPU-1
> ./z's with nearly 99% CPU and 2 ./z's with something like 45-ish% as the
> two images ping-pong on one cpu. Back when I was testing ULE vs 4BSD,
> this was/is due to ULE's cpu affinity where processes never migrate to
> another cpu. Admittedly, this was several years ago. Maybe ULE has
> gotten better, but George's rant seems to suggest otherwise.

Note: I'm only beginning to explore your report/case.

There is a significant difference in your report and
George's report: his is tied to nice use (and I've
replicated there being SCHED_4BSD vs. SCHED_ULE
consequences in the same direction George reports
but with much larger process counts involved). In
those types of experiments, I without the nice use
I did not find notable differences. But it is a
rather different context than your examples. Thus
the below as a start on separate experiments closer
to what you report using.

Not (yet) having a Fortran set up I did some simple
expriments with stress --cpu N (N processes looping
sqrt calculations) and top. In top I sorted by pid
to make it obvious if a fixed process was getting a
fixed CPU or WCPU. (I tried looking at both CPU and
WCPU, varying the time between samples as well. I
also varied stress's --backoff N . This was on a
ThreadRipper 1950X (32 hardware threads, so 16 cores)
that was running:

# uname -apKU
FreeBSD amd64_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #95 main-n261544-cee09bda03c8-dirty: Wed Mar 15 19:44:40 PDT 2023     root@amd64_ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400082 1400082

(That is a SCHED_ULE kernel. "GENERIC-NODBG-SCHED_4BSD"
would be listed for SCHED_4BSD if I end up doing
experiments with it.)

Trying things like --cpu 33 I never got a process that
sustained a low CPU or WCPU figure. The low figures
moved around across the processes as I watched.

When I tried the likes of --cpu 48 all the CPU or WCPU
figures were generally lower than 99% but also showed
variable figures that moved around across the processes.
It definitely did not produce a sustained, approximately
uniform figure across the processes.

For comparison/contrast: --cpu 32 does have all the 32
processes showing 99+% all the time.

This seems at least suggestive that, in my context, the
specific old behavior that you report does not show up,
at least on the timescales that I was observing at. It
still might not be something you would find appropriate,
but its does appear to at least be different.

There is the possibility that stress --cpu N leads to
more being involved than I expect and that such is
contributing to the behavior that I've observed.

===
Mark Millard
marklmi at yahoo.com