VFS/VM over-runs. Was: Apparent strange disk behaviour in 6.0

Thu Jul 28 08:34:48 GMT 2005

Pawel Jakub Dawidek wrote:

>On Thu, Jul 28, 2005 at 12:54:19AM -0700, Julian Elischer wrote:
>+> it APPEARS that teh system is swapping out running programs in order to 
>+> store more write data!
>+> 
>+> experiment:
>+> boot to single user mode.
>+> type:
>+> mount {big partition}
>+> dd if=/dev/zero bs=128K of=/$bigpartition}/bigfile count=1000000
>+> 
>+> notice that after a short while your dd is killed because the system is 
>+> out of swapspace.
>+> (it doesn't have any)
>+> Why the F*ck does it need swapspace.? there are exactly 2 proceses 
>+> running in userspace
>+> and one of them s in wait4(). dd shows a resident size of about 170KB
>+> leaving about a GIGABYTE of unused RAM.
>+> 
>+> The system should make dd wait rather than trying to swap its pages out..
>+> 
>+> 
>+> if you then do
>+> swapon (your swap device)
>+> and repeat teh command in the background,
>+> vmstat 1 will show you pages being faulted in and out...
>+> no WONDER IO goes to hell in a handbasket..
>+> 
>+> Outgoing IO should never be able to force running programs out!
>+> It should start re-using old pages from the same file!
>+> 
>+> 4.11 gives a consinstent 65MB/sec with this array, for as long as I run it..
>+> 6.0 gives me 65MB for 15 seconds and then it drops to 20MB/sec and then 
>+> 10MB/sec
>+> and the swap disk bursts into life.
>+> 
>+> the array goes from all the lights solidly on, to bursts of activity 
>+> with large gaps in between them.
>
>It looks I observed the same behaviour!
>
>I was testing GELI GEOM class and I was getting ENOMEM errors from malloc(9).
>Then, I was sure I've a memory leak, but this was only 'vmstat -m' issue so
>there was no memory leak and I shouldn't get ENOMEM in the first place while
>copying /usr/src/sys directory.
>
>I'm also able to reproduce your dd(1) test easly.
>
>Not sure when it was introduced...
>  
>

I wonder if there is some tunable that can be changed? or whether it's
just a bug..

I think that a write that finds no buffer space should first free old 
unused
buffers, and if there aren't any it should just wait.  Where's alan when 
you need him :-)