cvs commit: src/sys/kern sched_ule.c

Mon Oct 1 22:27:01 PDT 2007

On Mon, 1 Oct 2007, Kevin Oberman wrote:

>> Date: Mon, 1 Oct 2007 21:26:39 +1000 (EST)
>> From: Bruce Evans <brde at optusnet.com.au>
>>
>> On Mon, 1 Oct 2007, Jeff Roberson wrote:

>>> Given that the overwhelming amount of feedback by qualified poeple, I think
>>> it's fair to say that ULE gives a more responsive system under load.
>>
>> This is not my experience.  Maybe I don't run enough interactive bloatware
>> to have a large enough interactive load for the scheduler to make a
>> difference.
>
> That, or you don't run interactive on older systems with slow CPUs and
> limited memory. (This does NOT imply that ULE is going to help when
> experiencing heavy swapfile activity. I don't think anything helps
> that except more RAM.)

Not recently.  I used a P5/133 which was new in 1996 as an X client until
Y2K since it was fast enough to be an X client, but I stopped running
builds in it in 1998.

> The place it seem most evident to me is X responsiveness when the system
> (1GHz X 256MB PIII) is busy with large builds. Performance is terrible
> with 4BSD and only bad with ULE. Note that I am running Gnome (speaking
> of bloatware).
>
> The difference when running ULE is pretty dramatic.

Again, this is not my experience.  I don't run gnome, but occasionally
run X, and often run kernel builds and network benchmarks.  A quick
test now showed good interactivity for light browsing and editing at
a load average of 32 generated by a pessimized makeworld (-j16) on
both an A64 2.2GHz UP and a Celeron 366MHz UP.  The light interactive
use just doesn't need to run long enough for its priority to become
as low (numerically high) as the build.

Maybe heavy X use with streaming video is what you count as interactive.
I count that as not very hard realtime.

Further testing of my ~4BSD scheduler in ~5.2 indicates that when a
process wants less than about 1/loadavg of the CPU on average, it
usually just gets it, with no scheduling delays, since it usually has
higher priority than all other user processes.  Otherwise, the worst-case
scheduling delays increase from ~10 msec to ~2 seconds.  It is easy
to reduce the scheduling quantum from its default of 100 msec by a
factor of 100, but this doesn't seem to work right.  So the behaviour
is very dependent on the load and on the amount of CPU wanted by the
interactive process.

...

I now have more experience with ULE.  A version built today gave
dramatically worse interactivity, so much so that I think it must have
been broken recently.  A simple shell loop hangs the rest of the system
in some cases, and a background build has similar bad effects, probably
limited mainly by useful loops not being endless.

First I tried an old regression test for nice[1-2]:

%%%
for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
do
     nice -$i sh -c "while :; do echo -n;done" &
done
top -o time
%%%

This hung after starting only about one of the shell processes.  After
cutting the list down to just one process with nice -20, it still hung.
Shells on other syscons terminals running at rtprio 0 could not compete
with the nice -20 process:
- they could not start top to look at what was happening
- an already-running could not display anything new
- they could not start killall.
With the list cut down to about 6 processes, ps in ddb showed evidence 
of all the processes starting, and I was able to kill them all using
kill in ddb.

The above was with HZ = 100.  After changing HZ to 1000, one nice -20
process could be started with no problems, but similar problems occur
with a few more processes.  With a nice list of
"for i in 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20", one of the
shells apparently runs for about a minute before its priority is
reduced at all.  During this time, the symptoms were the same as above.
The shell that uses extra time initially is not usually the first
one in the list.  After starting all the shells, the behaviour was
normal, including niceness having too little effect.

On a later run, all the shells started in a couple of seconds (still
slow) even with the full nice list restored.

Running makeworld with just -j4 n the background gives similar symptoms.
When a new process is started, it sometimes gets too many cycles to
begin with, and apparently completely stops all processes in the
makeworld (but not the top displaying things) for several seconds.
After a while (I guess when the interactivity score descreases), this
behaviour changes to giving the new process very few cycles even if
it is semi-interactive (a foreground process started from a shell).
In at least this phase, ^C to kill processes doesn't work, but ^Z to
suspend them and then kill from the shell works normally, and interactivity
in not-very-bloated mail programs and editors is very bad.  A
non-interactive utility to measure the scheduling delay reports a max
delay of about 2 seconds for most runs, while with other schedulers
and kernels it only reports 2 seconds occasionally even at much higher
loads.

Other behaviour with 4BSD schedulers and various kernels:
- the max scheduling delay is almost independent of the CPU speed.
- the max scheduling delay is slightly worse for -current with 4BSD
   than with my ~5.2.
- -current has anomalous behaviour relative to ~5.2 for background
   makeworld -j16: many fewer runnable processes, a much smaller max
   load average, and many more zombies visible when top looks.
- in ~5.2, removing the hack that puts threads back on the head of
   the queue instead of the tail significantly reduces the max
   scheduling delay.  (This is a non-hack with related changes in
   -current, but I just used s/TAIL/HEAD/.)  This hack reduced makeworld
   time significantly.  I think removing it improves interactivity
   only by accident.  Removing it restores the old bogus scheduler
   behaviour of rescheduling on every "slow" interrupt, which gives
   essentially roundrobin scheduling under loads that generate lots
   of interrupts.  Interactivity is still poor because makeworld
   sometimes generates a few hundred processes per second and cycling
   through that many takes a long time even with a tiny quantum.

- reducing kern.sched.quantum never had much effect.  Same for
   increasing HZ in -current with 4BSD.

Bruce