Re: Periodic rant about SCHED_ULE

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 25 Mar 2023 18:14:11 UTC
Peter <pmc_at_citylink.dinoex.sub.org> wrote on
Date: Sat, 25 Mar 2023 15:47:42 UTC :

> Quoting George Mitchell <george+freebsd@m5p.com>:
> 
> >> https://forums.freebsd.org/threads/what-is-sysctl-kern-sched-preempt_thresh.85
> >>
> >Thank you! -- George
> 
> You're welcome. Can I get a success/failure report?
> 
> 
> ---------------------------------------------------------------------
> >> On 3/22/23, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> >>>
> >>> I reported the issue with ULE some 15 to 20 years ago.
> 
> Can I get the PR number, please?
> 
> 
> ---------------------------------------------------------------------
> Test usecase:
> =============
> 
> Create two compute tasks competing for the same -otherwise unused- core, 
> one without, one with syscalls: 
> 
> # cpuset -l 13 sh -c "while true; do :; done" & 
> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null 
> 
> Within a few seconds the two task are balanced, running at nearly the 
> same PRI and using each 50% of the core: 
> 
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 
> 5166 root 1 88 0 13M 3264K RUN 13 9:23 51.65% sh 
> 10675 root 1 87 0 13M 3740K CPU13 13 1:30 48.57% gzip 
> 
> This changes when the tar reaches /usr/include with it's many small 
> files. Now smaller blocks are delivered to gzip, it does more 
> syscalls, and things get ugly: 
> 
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 
> 5166 root 1 94 0 13M 3264K RUN 13 18:07 95.10% sh 
> 19028 root 1 81 0 13M 3740K CPU13 13 1:23 4.87% gzip 

Why did PID 10675 change to 19028?

> This does not happen because tar would be slow in moving data to 
> gzip: tar reads from SSD, or more likely from ARC, and this is 
> always faster than gzip-9. The imbalance is made by the scheduler.


When I tried that tar line, I get lots of output to stderr:

# tar cvf - / | cpuset -l 13 gzip -9 > /dev/null
tar: Removing leading '/' from member names
a .
a root
a wrkdirs
a bin
a usr
. . .

Was that an intentional part of the test?

To avoid this I used:

# tar cvf - / 2>/dev/null | cpuset -l 13 gzip -9 2>&1 > /dev/null

At which point I get the likes of:

17129 root          1  68    0  14192Ki    3628Ki RUN     13   0:20   3.95% gzip -9
17128 root          1  20    0  58300Ki   13880Ki pipdwt  18   0:00   0.27% tar cvf - / (bsdtar)
17097 root          1 133    0  13364Ki    3060Ki CPU13   13   8:05  95.93% sh -c while true; do :; done

up front.

For reference, I also see the likes of the following from
"gstat -spod" (it is a root on ZFS context with PCIe Optane media):

dT: 1.063s  w: 1.000s
 L(q)  ops/s    r/s     kB   kBps   ms/r    w/s     kB   kBps   ms/w    d/s     kB   kBps   ms/d    o/s   ms/o   %busy Name
. . .
    0     68     68     14    937    0.0      0      0      0    0.0      0      0      0    0.0      0    0.0    0.1| nvd2
. . .



===
Mark Millard
marklmi at yahoo.com