cvs commit: src/sys/kern sched_ule.c

Mon Oct 1 23:41:54 PDT 2007

On Tue, 2 Oct 2007, Bruce Evans wrote:

> On Mon, 1 Oct 2007, Kevin Oberman wrote:
>
>>> Date: Mon, 1 Oct 2007 21:26:39 +1000 (EST)
>>> From: Bruce Evans <brde at optusnet.com.au>
>>> 
>>> On Mon, 1 Oct 2007, Jeff Roberson wrote:
>
>>>> Given that the overwhelming amount of feedback by qualified poeple, I 
>>>> think
>>>> it's fair to say that ULE gives a more responsive system under load.
>>> 
>>> This is not my experience.  Maybe I don't run enough interactive bloatware
>>> to have a large enough interactive load for the scheduler to make a
>>> difference.
>> 
>> That, or you don't run interactive on older systems with slow CPUs and
>> limited memory. (This does NOT imply that ULE is going to help when
>> experiencing heavy swapfile activity. I don't think anything helps
>> that except more RAM.)
>
> Not recently.  I used a P5/133 which was new in 1996 as an X client until
> Y2K since it was fast enough to be an X client, but I stopped running
> builds in it in 1998.
>
>> The place it seem most evident to me is X responsiveness when the system
>> (1GHz X 256MB PIII) is busy with large builds. Performance is terrible
>> with 4BSD and only bad with ULE. Note that I am running Gnome (speaking
>> of bloatware).
>> 
>> The difference when running ULE is pretty dramatic.
>
> Again, this is not my experience.  I don't run gnome, but occasionally
> run X, and often run kernel builds and network benchmarks.  A quick
> test now showed good interactivity for light browsing and editing at
> a load average of 32 generated by a pessimized makeworld (-j16) on
> both an A64 2.2GHz UP and a Celeron 366MHz UP.  The light interactive
> use just doesn't need to run long enough for its priority to become
> as low (numerically high) as the build.
>
> Maybe heavy X use with streaming video is what you count as interactive.
> I count that as not very hard realtime.
>
> Further testing of my ~4BSD scheduler in ~5.2 indicates that when a
> process wants less than about 1/loadavg of the CPU on average, it
> usually just gets it, with no scheduling delays, since it usually has
> higher priority than all other user processes.  Otherwise, the worst-case
> scheduling delays increase from ~10 msec to ~2 seconds.  It is easy
> to reduce the scheduling quantum from its default of 100 msec by a
> factor of 100, but this doesn't seem to work right.  So the behaviour
> is very dependent on the load and on the amount of CPU wanted by the
> interactive process.
>
> ...
>
> I now have more experience with ULE.  A version built today gave
> dramatically worse interactivity, so much so that I think it must have
> been broken recently.  A simple shell loop hangs the rest of the system
> in some cases, and a background build has similar bad effects, probably
> limited mainly by useful loops not being endless.
>

I'm not able to reproduce this and no one else has reported it.  This may 
be the result of some incompatibility between bdebsd and ULE.  Is this a 
SMP machine?  Do you have PREEMPTION enabled?  ULE recently started 
honoring preemption.  Try setting:

kern.sched.preempt_thresh: 64

if it is not already.  I know you deal with hardclock differently. 
Without PREEMPTION it may not work correctly.

> First I tried an old regression test for nice[1-2]:
>
> %%%
> for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> do
>    nice -$i sh -c "while :; do echo -n;done" &
> done
> top -o time
> %%%

I use this:
for i in -20 -16 -12 -8 -4 0 4 8 12 16 20
do
         nice -$i sh -c "while :; do echo -n;done" &
done
top -o time

I like to verify that the distribution doesn't get out of whack.  It takes 
some time to settle before the higher nice threads get enough runtime to 
sort properly.  My results are as so:

   868 root          1  81  -20  3492K  1404K RUN      0:28 23.58% sh
   869 root          1  83  -16  3492K  1404K RUN      0:20 15.09% sh
   870 root          1  86  -12  3492K  1404K RUN      0:16 12.16% sh
   871 root          1  90   -8  3492K  1404K RUN      0:12  8.89% sh
   872 root          1  93   -4  3492K  1404K RUN      0:11  7.96% sh
   873 root          1  97    0  3492K  1404K RUN      0:09  6.59% sh
   874 root          1 101    4  3492K  1404K RUN      0:08  4.88% sh
   875 root          1 105    8  3492K  1404K RUN      0:07  5.37% sh
   876 root          1 109   12  3492K  1404K RUN      0:06  3.37% sh
   877 root          1 113   16  3492K  1404K RUN      0:06  4.05% sh
   878 root          1 116   20  3492K  1404K RUN      0:05  3.96% sh

Really might not be enough difference with positive nice values.  I've 
never really had a good feeling about how nice should really behave but 
this mostly seems reasonable.  It would be possible to tweak the algorithm 
to further penalize nice.

Jeff

>
> This hung after starting only about one of the shell processes.  After
> cutting the list down to just one process with nice -20, it still hung.
> Shells on other syscons terminals running at rtprio 0 could not compete
> with the nice -20 process:
> - they could not start top to look at what was happening
> - an already-running could not display anything new
> - they could not start killall.
> With the list cut down to about 6 processes, ps in ddb showed evidence of all 
> the processes starting, and I was able to kill them all using
> kill in ddb.
>
> The above was with HZ = 100.  After changing HZ to 1000, one nice -20
> process could be started with no problems, but similar problems occur
> with a few more processes.  With a nice list of
> "for i in 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20", one of the
> shells apparently runs for about a minute before its priority is
> reduced at all.  During this time, the symptoms were the same as above.
> The shell that uses extra time initially is not usually the first
> one in the list.  After starting all the shells, the behaviour was
> normal, including niceness having too little effect.
>
> On a later run, all the shells started in a couple of seconds (still
> slow) even with the full nice list restored.
>
> Running makeworld with just -j4 n the background gives similar symptoms.
> When a new process is started, it sometimes gets too many cycles to
> begin with, and apparently completely stops all processes in the
> makeworld (but not the top displaying things) for several seconds.
> After a while (I guess when the interactivity score descreases), this
> behaviour changes to giving the new process very few cycles even if
> it is semi-interactive (a foreground process started from a shell).
> In at least this phase, ^C to kill processes doesn't work, but ^Z to
> suspend them and then kill from the shell works normally, and interactivity
> in not-very-bloated mail programs and editors is very bad.  A
> non-interactive utility to measure the scheduling delay reports a max
> delay of about 2 seconds for most runs, while with other schedulers
> and kernels it only reports 2 seconds occasionally even at much higher
> loads.
>
> Other behaviour with 4BSD schedulers and various kernels:
> - the max scheduling delay is almost independent of the CPU speed.
> - the max scheduling delay is slightly worse for -current with 4BSD
>  than with my ~5.2.
> - -current has anomalous behaviour relative to ~5.2 for background
>  makeworld -j16: many fewer runnable processes, a much smaller max
>  load average, and many more zombies visible when top looks.
> - in ~5.2, removing the hack that puts threads back on the head of
>  the queue instead of the tail significantly reduces the max
>  scheduling delay.  (This is a non-hack with related changes in
>  -current, but I just used s/TAIL/HEAD/.)  This hack reduced makeworld
>  time significantly.  I think removing it improves interactivity
>  only by accident.  Removing it restores the old bogus scheduler
>  behaviour of rescheduling on every "slow" interrupt, which gives
>  essentially roundrobin scheduling under loads that generate lots
>  of interrupts.  Interactivity is still poor because makeworld
>  sometimes generates a few hundred processes per second and cycling
>  through that many takes a long time even with a tiny quantum.
>
> - reducing kern.sched.quantum never had much effect.  Same for
>  increasing HZ in -current with 4BSD.
>
> Bruce
>