svn commit: r219003 - head/usr.bin/nice

Sat Feb 26 07:31:04 UTC 2011

On Fri, 25 Feb 2011, John Baldwin wrote:

> On Friday, February 25, 2011 5:23:04 am Remko Lodder wrote:
>>
>> On Feb 24, 2011, at 10:47 PM, Bruce Evans wrote:
>>
>>> On Thu, 24 Feb 2011, John Baldwin wrote:
>>>
>>>> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote:
>>>>>
>>> [contex restored:
>>> +A priority of 19 or 20 will prevent a process from taking any cycles from
>>> +others at nice 0 or better.]
>>
>>
>> [strip information overload].
>>
>> So, what is the consensus instead of overwelming me with too much
> information?
>
> Take that sentence out.  It is not accurate for our schedulers.

And an accurate sentence would require condensing even more information :-).

I just remembered another difference for the multi-CPU case that is
probably very large.  Long ago, I hacked SCHED_4BSD to scale td_estcpu
by the number of CPUs (INVERSE_ESTCPU_WEIGHT = 8 for !SMP but (8 *
smp_cpus) for SMP.  Without this, td_estcpu built up too fast (really,
failed to decay fast enough due to the decay not being scaled by the
number of CPUs) in some common cases, so the nonlinear region of mapping
from td_estcpu to td_priority was often reached.  With this, td_priority
is too insensitive to td_estcpu in some hopefully-less-common cases.

I haven't tested SCHED_4BSD on a multi-CPU system recently, but the
test method of pinning all the CPU hogs to 1 CPU gives an uncommon
case for which the hack works especially badly.  The pinning gives only
1 active CPU so the !SMP scale factor should be applied, but it isn't.
So td_estcpu decays 8 times faster than it should ((?) -- I think the
scaling is linear), and may even decay to 0 before it is used.  If there
is still some left when the thread is rescheduled, then the sensitivity
is just reduced by a factor of 8, but if there is none then the scheduling
becomes unfair.  OTOH, the sensitivity to niceness isn't changed, so in
the best case where the decay is not to 0, niceness become smp_cpus times
more sensitive than ordinary scheduling relative to the !SMP case.  And
this is a relatively simple case to understand and fix.  The load average
and/or INVERSE_ESTCPU_WEIGHT probably need to be per-CPU to give the old
algorithm a chance of working, but only if there is some manual scheduling
by pinning threads to CPUs.  Otherwise, assignment of threads to CPUs
should make things average out -- non-affinity is required to make things
sort of work.

In my version of SCHED_4BSD, the scale factor is calculated
dynamically according to the maximum td_estcpu.  This too gives
insensitivity of scheduling to td_estcpu, but only when the maximum
td_estcpu is large, in which case there is nothing much that can be
done (when you have a large dynamic range of td_estcpu's, it is just
impossible to map them linearly or logarithmically into the small
priority space and it may be impossible to map them uniquely).  I
don't try to support SMP, and my maximum td_estcpu only does the
right thing if the maximum td_estcpu for threads that will run on
each CPU is almost the same for all CPUs.

Bruce