svn commit: r219003 - head/usr.bin/nice

Bruce Evans brde at optusnet.com.au
Thu Feb 24 22:23:01 UTC 2011


On Fri, 25 Feb 2011, Bruce Evans wrote:

> On Thu, 24 Feb 2011, John Baldwin wrote:
>
>> On Thursday, February 24, 2011 2:03:33 pm Remko Lodder wrote:
>>> 
> [contex restored:
> +A priority of 19 or 20 will prevent a process from taking any cycles from
> +others at nice 0 or better.]
>
>>> On Feb 24, 2011, at 7:47 PM, John Baldwin wrote:
>>> 
>>>> Are you sure that this statement applies to both ULE and 4BSD?  The two
>>>> schedulers treat nice values a bit differently.
>>> 
>>> No I am not sure that the statement applies, given your response I 
>>> understand
>>> that both schedulers work differently. Can you or David tell me what the 
>>> difference
>>> is so that I can properly document it? I thought that the tool is doin the 
>>> same for all
>>> schedulers, but that the backend might treat it differently.
>
> I'm sure that testing would show that it doesn't apply in FreeBSD.  It is
> supposed to apply only approximately in FreeBSD, but niceness handling in
> FreeBSD is quite broken so it doesn't apply at all.  Also, the magic numbers
> of 19 and 20 probably don't apply in FreeBSD.  These were because there
> nicenesses that are the same mod 2 (maybe after adding 1) have the same
> effect, since priorities that are the same mode RQ_PPQ = 4 have the same
> effect and the niceness space was scaled to the priority space by
> multiplying by NICE_WEIGHT = 2.  But NICE_WEIGHT has been broken to be 1
> in FreeBSD with SCHED_4BSD and doesn't apply with SCHED_ULE.  With
> SCHED_4BSD, there are 4 (not 2) nice values near 20 that give the same
> behaviour.
>
> It strictly only applies to broken schedulers.  Preventing a process
> from taking *any* cycles gives priority inversion livelock.  FreeBSD
> has priority propagation to prevent this.

Just tried it with SCHED_4BSD.  On a multi-CPU system (ref9-i386), but
I think I used cpuset correctly to emulate 1 CPU.

% last pid: 85392;  load averages:  1.71,  0.86,  0.38   up 94+01:00:36  21:55:59
% 66 processes:  3 running, 63 sleeping
% CPU:  6.9% user,  3.7% nice,  2.0% system,  0.0% interrupt, 87.3% idle
% Mem: 268M Active, 4969M Inact, 310M Wired, 50M Cache, 112M Buf, 2413M Free
% Swap: 8192M Total, 580K Used, 8191M Free
% 
%   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
% [... system is not nearly idle, but plenty of CPUs to spare]
% 85368 bde           1 111    0  9892K  1312K RUN     1   1:07 65.67% sh
% 85369 bde           1 123   20  9892K  1312K CPU1    1   0:35 37.89% sh

This shows the bogus 1:2 ratio even for a niceness difference of 20.  I've
seen too much of this ratio.  IIRC, before FreeBSD-4 was fixed, the
various nonlinearities caused by not even clamping, combined with the
broken scaling, gave a ratio of about this.  Then FreeBSD-5 restored
a similarly bogus ratio.  Apparently, the algorithm for decaying p_estcpu
in SCHED_4BSD tends to generate this ratio.  SCHED_ULE uses a completely
different algorithm and I think it has more control over the scaling, so
it is surprising that it duplicates this brokenness so perfectly.

And here is what it does with more nice values: this was generated by:

% for i in 0 2 4 6 8 10 12 14 16 18 20
% do
% 	cpuset -l 1 nice -$i sh -c "while :; do echo -n;done" &
% done
% top -o time

% last pid: 85649;  load averages: 10.99,  9.06,  5.35  up 94+01:19:33    22:14:56
% 74 processes:  12 running, 62 sleeping
% 
% Mem: 270M Active, 4969M Inact, 310M Wired, 50M Cache, 112M Buf, 2411M Free
% Swap: 8192M Total, 580K Used, 8191M Free
% 
% 
%   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND 
% 85581 bde           1  98    0  9892K  1312K RUN     1   0:48 11.47% sh
% 85582 bde           1 100    2  9892K  1312K RUN     1   0:45 10.69% sh
% 85583 bde           1 102    4  9892K  1312K RUN     1   0:42 10.35% sh
% 85584 bde           1 104    6  9892K  1312K CPU1    1   0:40  9.47% sh
% 85585 bde           1 106    8  9892K  1312K RUN     1   0:38  8.79% sh
% 85586 bde           1 108   10  9892K  1312K RUN     1   0:36  8.06% sh
% 85587 bde           1 110   12  9892K  1312K RUN     1   0:34  8.40% sh
% 85588 bde           1 111   14  9892K  1312K RUN     1   0:33  8.50% sh
% 85589 bde           1 113   16  9892K  1312K RUN     1   0:31  7.67% sh
% 85590 bde           1 115   18  9892K  1312K RUN     1   0:30  7.28% sh
% 85591 bde           1 117   20  9892K  1312K RUN     1   0:29  6.69% sh

This is OK except for the far-too-small dynamic range of 29:48 (even worse
than 1:2).

My version spaces out things nicely according to its table:

% last pid:  1374;  load averages: 11.02,  8.74,  4.93    up 0+02:26:12  09:16:47
% 43 processes:  12 running, 31 sleeping
% CPU: 14.0% user, 85.7% nice,  0.0% system,  0.4% interrupt,  0.0% idle
% Mem: 35M Active, 23M Inact, 67M Wired, 24K Cache, 61M Buf, 876M Free
% Swap:
% 
%   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
%  1325 root        1 120    0   856K   572K RUN      2:18 28.52% sh
%  1326 root        1 120    2   856K   572K RUN      1:39 19.97% sh
%  1327 root        1 120    4   856K   572K RUN      1:10 13.96% sh
%  1328 root        1 120    6   856K   572K RUN      0:50  9.72% sh
%  1329 root        1 123    8   856K   572K RUN      0:36  7.18% sh
%  1330 root        1 123   10   856K   572K RUN      0:25  5.03% sh
%  1331 root        1 124   12   856K   572K RUN      0:18  2.93% sh
%  1332 root        1 124   14   856K   572K RUN      0:13  1.86% sh
%  1333 root        1 124   16   856K   572K RUN      0:09  0.98% sh
%  1334 root        1 124   18   856K   572K RUN      0:06  1.07% sh
%  1335 root        1 123   20   856K   572K RUN      0:05  0.15% sh

The dynamic range here is 5:138.  Not as close to the table's 1:32 as
I would like.

Bruce


More information about the svn-src-all mailing list