small patch for pageout. Comments?

Thu Nov 30 21:09:20 UTC 2017

On Thu, Nov 30, 2017 at 04:01:31PM -0500, Mark Johnston wrote:
> On Thu, Nov 30, 2017 at 12:50:41PM -0800, Larry McVoy wrote:
> > On Thu, Nov 30, 2017 at 03:47:50PM -0500, Mark Johnston wrote:
> > > > I dunno if there is a "right amount".  I could make it a little smarter by
> > > > keeping track of how many pages we freed and sleep if we freed none in a 
> > > > scan (which seems really unlikely).
> > > 
> > > This situation can happen if the inactive queue is full of dirty pages.
> > > A problem with your patch is that we might not give enough time to the
> > > laundry thread (the thread responsible for writing the contents of dirty
> > > pages to disk and returning them to inactive queue for the page daemon
> > > to free) to write out dirty pages. In this case we might trigger the OOM
> > > killer prematurely, and in fact this scenario is what motivated r300865.
> > > So I would argue that we do in fact need to sleep if the page daemon is
> > > failing to make progress, in order to give time for I/O to complete.
> > 
> > OK, that sounds reasonable.  So what defines progress?  v_dfree not 
> > increasing?  Is one page freed progress?
> 
> One page freed is progress. We currently invoke the OOM killer only when
> the page daemon is making no progress. This turns out to be too
> conservative, which is what kib's patch attempts to address. wrt
> your patch, I'm saying that I think we should still sleep after a scan
> that failed to free any pages.

Something like this?

--- a/sys/vm/vm_pageout.c
+++ b/sys/vm/vm_pageout.c
@@ -1752,6 +1752,7 @@ vm_pageout_worker(void *arg)
        struct vm_domain *domain;
        int domidx, pass;
        bool target_met;
+       u_int dfree;
 
        domidx = (uintptr_t)arg;
        domain = &vm_dom[domidx];
@@ -1776,6 +1777,7 @@ vm_pageout_worker(void *arg)
         */
        while (TRUE) {
                mtx_lock(&vm_page_queue_free_mtx);
+               dfree = VM_CNT_FETCH(v_dfree);
 
                /*
                 * Generally, after a level >= 1 scan, if there are enough
@@ -1815,10 +1817,22 @@ vm_pageout_worker(void *arg)
                         * (page reclamation) scan, then increase the level
                         * and scan again now.  Otherwise, sleep a bit and
                         * try again later.
+                        *
+                        * If we have more than one CPU this pause is not
+                        * helpful, it just decreases the rate at which we
+                        * clean pages.  On a uniprocessor we want to pause
+                        * to let the user level processes get some time to
+                        * run.  We also don't keep banging on the page tables
+                        * if we didn't manage to free any in the last pass.
                         */
                        mtx_unlock(&vm_page_queue_free_mtx);
-                       if (pass >= 1)
-                               pause("psleep", hz / VM_INACT_SCAN_RATE);
+                       if (pass >= 1) {
+                               dfree = VM_CNT_FETCH(v_dfree) - dfree;
+                               if ((dfree == 0) || (mp_ncpus < 2)) {
+if (!dfree) printf("Sleeping because pass %d didn't find anything\n", pass);
+                                       pause("psleep", hz / VM_INACT_SCAN_RATE);
+                               }
+                       }
                        pass++;
                } else {
                        /*