4BSD process starvation during I/O

Thu Nov 24 01:08:43 GMT 2005

On 24/11/2005, at 12:02 PM, Kris Kennaway wrote:

> On Thu, Nov 24, 2005 at 11:54:05AM +1100, Sam Lawrance wrote:
>>
>> On 24/11/2005, at 7:18 AM, Kris Kennaway wrote:
>>
>>> I have noticed that when multiple identical processes (e.g. gtar, or
>>> dd) are run on 4BSD, when there are N CPUs on a machine there  
>>> will be
>>> N processes that run with a higher CPU share than all the  
>>> others.  As
>>> a result, these N processes finish first, then another N, and so on.
>>>
>>> This is true under both 4.11 and 6.0 (so in that sense it's not so
>>> surprising), but the effect is much more pronounced on 6.0 (which  
>>> may
>>> be possible to fix).
>>>
>>> Here are the exit times for 6 identical gtar processes (and same  
>>> 4.11
>>> gtar binary) started together on a 2-CPU machine:
>>>
>>> 6.0:
>>>
>>> 1132776233
>>> 1132776235
>>> 1132776264
>>> 1132776265
>>> 1132776279
>>> 1132776279
>>>      238.86 real        10.87 user       166.00 sys
>>>
>>> You can see they finish in pairs, and there's a spread of 46 seconds
>>> from first to last.
>>>
>>> On 4.11:
>>>
>>> 1132775426
>>> 1132775429
>>> 1132775431
>>> 1132775432
>>> 1132775448
>>> 1132775449
>>>      275.56 real         0.43 user       336.26 sys
>>>
>>> They also finish in pairs, but the spread is half, at 23 seconds.
>>>
>>> This seems to be correlated to the rate at which the processes  
>>> perform
>>> I/O.  On a quad amd64 machine running 6.0 when I run multiple dd
>>> processes at different offsets in a md device:
>>>
>>> 268435456 bytes transferred in 1.734285 secs (154781618 bytes/sec)
>>> 268435456 bytes transferred in 1.737857 secs (154463501 bytes/sec)
>>> 268435456 bytes transferred in 1.751760 secs (153237575 bytes/sec)
>>> 268435456 bytes transferred in 3.263460 secs (82254865 bytes/sec)
>>> 268435456 bytes transferred in 3.295294 secs (81460244 bytes/sec)
>>> 268435456 bytes transferred in 3.349770 secs (80135487 bytes/sec)
>>> 268435456 bytes transferred in 4.716637 secs (56912467 bytes/sec)
>>> 268435456 bytes transferred in 4.850927 secs (55336941 bytes/sec)
>>> 268435456 bytes transferred in 4.953528 secs (54190760 bytes/sec)
>>>
>>> They finish in groups of 3 here since the 4th CPU is being used to
>>> drive the md worker thread (which takes up most of the CPU).  In  
>>> this
>>> case the first 3 dd processes get essentially 100% of the CPU,  
>>> and the
>>> rest get close to 0% until those 3 processes finish.
>>>
>>> Perhaps this can be tweaked.
>>>
>>
>> I tried this on a dual Xeon, with 12 processes like
>>
>> 	mdconfig -a -t swap -s 320m
>> 	dd if=/dev/md0 of=1 bs=1m skip=0 count=40 &
>> 	dd if=/dev/md0 of=2 bs=1m skip=40 count=40 &
>
> You're reading from the md, not writing to it.  Sorry if that wasn't
> clear.
>
> My test is:
>
> #!/bin/sh
>
> mdconfig -d -u 0
> mdconfig -a -t swap -s 16g
> for i in `jot $1 1`; do
>         dd if=/dev/zero of=/dev/md0 seek=$(($i*16384)) count=16384  
> bs=16k > /dev/null &
> done
> wait

Ah :-)  Now I see it, very pronounced.