pageout question

Mon Jul 26 19:00:33 UTC 2010

on 25/07/2010 23:43 Andriy Gapon said the following:
> on 25/07/2010 23:28 RW said the following:
>> I didn't say it say it was guaranteed. I just think the scenario where
>> a first pass ends up between the watermarks is rare. And when it
>> happens I don't see a compelling reason to do extra paging to reach an
>> arbitrary target.
> 
> Well, it seems neither I nor you have data to show whether it's rare or not (and
> it would greatly depend on workload too).
> As to "arbitrary target" - well, that's the whole point of hysteresis-like
> behavior.  We start paging also at an "arbitrary" point.

Well, it seems that you are right (at least to a certain degree) - with
"moderately high" memory load (starting lots of memory hungry "real"
applications and not letting them sit idle) a single pass was always sufficient.
 Even with my suggested change! :-)  I.e. that single pass was always able to
shoot to or over the high watermark.
So, in fact, there is not much (any?) difference between current code and
patched code in this case.

But not quite so with stress2 swap test.
In that case more than one pass was needed in almost all the cases.  Again, this
is with patched vm_pageout().

Which brings another interesting point which was overlooked initially.
vm_pageout() loop can make at most two passes back-to-back, after that it slows
down to make an additional pass every 1/2 seconds:
if (vm_pages_needed) {
        /*
         * Still not done, take a second pass without waiting
         * (unlimited dirty cleaning), otherwise sleep a bit
         * and try again.
         */
        ++pass;
        if (pass > 1)
                msleep(&vm_pages_needed,
                    &vm_page_queue_free_mtx, PVM, "psleep",
                    hz / 2);
} else {

With the patched code and stress2 I indeed observed pagedaemon spending time in
this sleep.

On the other hand, current unpatched code is more optimistic about calling it
done.  So even if only a handful of pages is freed and available memory goes
just above low watermark, pagedaemon would decide that it had a successful pass
and would reset pass count to zero.  Those freed pages would, of course, get
consumed immediately and a new pass would be requested.  Since the history is
lost at this point, there would be no rate limit for the new pass.

So my _theory_ is that in very harsh conditions doing true hysteresis would
result in many _accounted_ passes and thus throttled down pagedaemon.  On the
other hand, the current code would still do many passes because of the constant
memory pressure, but they will be (mostly) unaccounted and thus pagedaemon would
be scanning pages 'like crazy'.

In other words: with current code available page count would rapidly oscillate
around low watermark, while with patched code available page count would mostly
stay low.

Not sure which one is better.  But for me, in such extreme conditions,  slowing
things down sounds better than spinning pagedaemon.

P.S.
Just in case, I would like to point out that the patch doesn't change condition
when the waiters are notified about available memory - it is still
!vm_page_count_min().  The patches only changes when vm_pages_needed is reset.
This is kind of obvious, but I decided to make it explicit.

-- 
Andriy Gapon