Re: Periodic rant about SCHED_ULE
- In reply to: Mark Millard : "Re: Periodic rant about SCHED_ULE"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 25 Mar 2023 18:23:04 UTC
On Mar 25, 2023, at 11:14, Mark Millard <marklmi@yahoo.com> wrote:
> Peter <pmc_at_citylink.dinoex.sub.org> wrote on
> Date: Sat, 25 Mar 2023 15:47:42 UTC :
>
>> Quoting George Mitchell <george+freebsd@m5p.com>:
>>
>>>> https://forums.freebsd.org/threads/what-is-sysctl-kern-sched-preempt_thresh.85
>>>>
>>> Thank you! -- George
>>
>> You're welcome. Can I get a success/failure report?
>>
>>
>> ---------------------------------------------------------------------
>>>> On 3/22/23, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>>>>>
>>>>> I reported the issue with ULE some 15 to 20 years ago.
>>
>> Can I get the PR number, please?
>>
>>
>> ---------------------------------------------------------------------
>> Test usecase:
>> =============
>>
>> Create two compute tasks competing for the same -otherwise unused- core,
>> one without, one with syscalls:
>>
>> # cpuset -l 13 sh -c "while true; do :; done" &
>> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null
>>
>> Within a few seconds the two task are balanced, running at nearly the
>> same PRI and using each 50% of the core:
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
>> 5166 root 1 88 0 13M 3264K RUN 13 9:23 51.65% sh
>> 10675 root 1 87 0 13M 3740K CPU13 13 1:30 48.57% gzip
>>
>> This changes when the tar reaches /usr/include with it's many small
>> files. Now smaller blocks are delivered to gzip, it does more
>> syscalls, and things get ugly:
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
>> 5166 root 1 94 0 13M 3264K RUN 13 18:07 95.10% sh
>> 19028 root 1 81 0 13M 3740K CPU13 13 1:23 4.87% gzip
>
> Why did PID 10675 change to 19028?
>
>> This does not happen because tar would be slow in moving data to
>> gzip: tar reads from SSD, or more likely from ARC, and this is
>> always faster than gzip-9. The imbalance is made by the scheduler.
>
>
> When I tried that tar line, I get lots of output to stderr:
>
> # tar cvf - / | cpuset -l 13 gzip -9 > /dev/null
> tar: Removing leading '/' from member names
> a .
> a root
> a wrkdirs
> a bin
> a usr
> . . .
>
> Was that an intentional part of the test?
>
> To avoid this I used:
>
> # tar cvf - / 2>/dev/null | cpuset -l 13 gzip -9 2>&1 > /dev/null
>
> At which point I get the likes of:
>
> 17129 root 1 68 0 14192Ki 3628Ki RUN 13 0:20 3.95% gzip -9
> 17128 root 1 20 0 58300Ki 13880Ki pipdwt 18 0:00 0.27% tar cvf - / (bsdtar)
> 17097 root 1 133 0 13364Ki 3060Ki CPU13 13 8:05 95.93% sh -c while true; do :; done
>
> up front.
>
> For reference, I also see the likes of the following from
> "gstat -spod" (it is a root on ZFS context with PCIe Optane media):
>
> dT: 1.063s w: 1.000s
> L(q) ops/s r/s kB kBps ms/r w/s kB kBps ms/w d/s kB kBps ms/d o/s ms/o %busy Name
> . . .
> 0 68 68 14 937 0.0 0 0 0 0.0 0 0 0 0.0 0 0.0 0.1| nvd2
> . . .
>
>
I left it running and I'm now seeing:
17129 root 1 107 0 14192Ki 3628Ki CPU13 13 3:01 48.10% gzip -9
17128 root 1 21 0 58300Ki 15428Ki pipdwt 20 0:04 2.02% tar cvf - / (bsdtar)
17097 root 1 115 0 13364Ki 3060Ki RUN 13 16:30 51.77% sh -c while true; do :; done
Also examples of the likes of:
dT: 1.063s w: 1.000s
L(q) ops/s r/s kB kBps ms/r w/s kB kBps ms/w d/s kB kBps ms/d o/s ms/o %busy Name
. . .
0 1213 1213 5 6456 0.0 0 0 0 0.0 0 0 0 0.0 0 0.0 1.2| nvd2
. . .
FYI: ThreadRipper 1950X context.
Looks like what I'll see is very dependent on when I
look at what it is doing: the details involved matter.
===
Mark Millard
marklmi at yahoo.com