svn commit: r305362 - in head: share/man/man9 sys/amd64/amd64 sys/arm/arm sys/arm64/arm64 sys/conf sys/i386/i386 sys/i386/include sys/mips/mips sys/powerpc/aim sys/powerpc/booke sys/powerpc/powerpc...

Bruce Evans brde at optusnet.com.au
Sun Sep 4 04:12:06 UTC 2016


On Sat, 3 Sep 2016, Mark Johnston wrote:

> Log:
>  Remove support for idle page zeroing.
>
>  Idle page zeroing has been disabled by default on all architectures since
>  r170816 and has some bugs that make it seemingly unusable. Specifically,
>  the idle-priority pagezero thread exacerbates contention for the free page
>  lock, and yields the CPU without releasing it in non-preemptive kernels. The
>  pagezero thread also does not behave correctly when superpage reservations
>  are enabled: its target is a function of v_free_count, which includes
>  reserved-but-free pages, but it is only able to zero pages belonging to the
>  physical memory allocator.
>
>  Reviewed by:	alc, imp, kib

It worked well in 2007.  I tried to fix it, and asked alc to fix it,
in 2008, but didn't get anywhere.

Now another problem is obvious.  Memories and CPUs are a bit faster,
but context switches are still very slow.  My newest system can "rep
stosb" at 16-128GB/sec or 1/32-1/4usec per page, but it takes 1 usec
for a (user) context switch.  So to amortize the cost of a context
switch, idlezero needs to zero many pages per switch, perhaps hundreds,
and to ensure this it must run at high (numerically low) priority and
schedule itself to not become too active, but it does exactly the
opposite (idle priority, and then if PREEMPTION is configured,
scheduled generally with the opposite policy).  But if it schedules
itself to do many pages at a time, this gives bad latency.

Zeroing in idle last worked correctly in FreeBSD-4 with UP.  Then
there were no kernel context switches and vm_page_zero_idle() was 
just a function call away from the idle loop.  However, the version
with context switching was better than nothing with slow memory in
2007.

This is not fixed in the following patch for a ~2008 version:

X Index: vm_phys.c
X ===================================================================
X --- vm_phys.c	(revision 181737)
X +++ vm_phys.c	(working copy)
X @@ -41,6 +41,8 @@
X  #include <sys/malloc.h>
X  #include <sys/mutex.h>
X +#include <sys/proc.h>
X  #include <sys/queue.h>
X  #include <sys/sbuf.h>
X +#include <sys/sched.h>
X  #include <sys/sysctl.h>
X  #include <sys/vmmeter.h>
X @@ -552,7 +554,18 @@
X  					cnt.v_free_count--;
X  					mtx_unlock(&vm_page_queue_free_mtx);
X +#ifndef PREEMPTION_AND_PREEMPTION_WORKS
X +					if (sched_runnable()) {
X +						thread_lock(curthread);
X +						critical_exit();
X +						mi_switch(SW_VOL | SWT_IDLE,
X +						    NULL);
X +						thread_unlock(curthread);
X +					} else
X +#endif
X +						critical_exit();
X  					pmap_zero_page_idle(m_tmp);
X  					m_tmp->flags |= PG_ZERO;
X  					mtx_lock(&vm_page_queue_free_mtx);
X +					critical_enter();
X  					cnt.v_free_count++;
X  					vm_phys_free_pages(m_tmp, 0);
X Index: vm_zeroidle.c
X ===================================================================
X --- vm_zeroidle.c	(revision 181737)
X +++ vm_zeroidle.c	(working copy)
X @@ -122,18 +122,14 @@
X 
X  	mtx_lock(&vm_page_queue_free_mtx);
X +	critical_enter();
X  	for (;;) {
X  		if (vm_page_zero_check()) {
X  			vm_page_zero_idle();
X -#ifndef PREEMPTION
X -			if (sched_runnable()) {
X -				thread_lock(curthread);
X -				mi_switch(SW_VOL | SWT_IDLE, NULL);
X -				thread_unlock(curthread);
X -			}
X -#endif
X  		} else {
X  			wakeup_needed = TRUE;
X +			critical_exit();
X  			msleep(&zero_state, &vm_page_queue_free_mtx, 0,
X  			    "pgzero", hz * 300);
X +			critical_enter();
X  		}
X  	}

PREEMPTION had to be turned off for best results.  This is only practical
for SMP systems.  It was more broken (gave too many context switches) in
2007 than now.  I still turn it off for SMP.  Perhaps the extra context
switches had a lot to do with this idlezero problem.  Any time the system
goes idle for a short time, idlezero runs for a short time.  Then it
gets preempted, but still holds the lock, so there may be another context
switch back to it.  This might be repeated several times per page.

The version in FreeBSD-5.2 is threaded and uses preemption if available,
but has vestiges of the FreeBSD-4 scheduling.  It sleeps voluntarily
after zeroing maxrun pages.  But maxrun defaults to 16.  A system too
new to run FreeBSD-4 is just warming up after doing 16 pages.

So my only idea for making this work is:
- do it from the idle loop (first remove idle threads, a larger task)
- do only 1 page at a time, at high priority 
- use trylock (while at high priority) and not lock (at idle priority)
   so see if the lock can be acquired.

I now remember that it already raised its priority while holding the
lock, using a critical section.  Doesn't that work?  My fix has
something to do with this -- I enlarged the critical section.  It
should be around acquiring the lock too, so that we never get preempted
while holding the lock, but that needs trylock.

Bruce


More information about the svn-src-all mailing list