kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix

Wed Mar 19 14:20:01 UTC 2014

The following reply was made to PR kern/187594; it has been noted by GNATS.

From: Karl Denninger <karl at denninger.net>
To: avg at FreeBSD.org
Cc: freebsd-fs at freebsd.org, bug-followup at FreeBSD.org
Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and fix
Date: Wed, 19 Mar 2014 09:18:40 -0500

 This is a cryptographically signed message in MIME format.

 --------------ms010701070402040604030408
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: quoted-printable

 On 3/18/2014 12:19 PM, Karl Denninger wrote:
 >
 > On 3/18/2014 10:20 AM, Andriy Gapon wrote:
 >> The following reply was made to PR kern/187594; it has been noted by=20
 >> GNATS.
 >>
 >> From: Andriy Gapon <avg at FreeBSD.org>
 >> To: bug-followup at FreeBSD.org, karl at fs.denninger.net
 >> Cc:
 >> Subject: Re: kern/187594: [zfs] [patch] ZFS ARC behavior problem and f=
 ix
 >> Date: Tue, 18 Mar 2014 17:15:05 +0200
 >>
 >>   Karl Denninger <karl at fs.denninger.net> wrote:
 >>   > ZFS can be convinced to engage in pathological behavior due to a b=
 ad
 >>   > low-memory test in arc.c
 >>   >
 >>   > The offending file is at
 >>   > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c; it =

 >> allegedly
 >>   > checks for 25% free memory, and if it is less asks for the cache=20
 >> to shrink.
 >>   >
 >>   > (snippet from arc.c around line 2494 of arc.c in 10-STABLE; path
 >>   > /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs)
 >>   >
 >>   > #else /* !sun */
 >>   > if (kmem_used() > (kmem_size() * 3) / 4)
 >>   > return (1);
 >>   > #endif /* sun */
 >>   >
 >>   > Unfortunately these two functions do not return what the authors=20
 >> thought
 >>   > they did. It's clear what they're trying to do from the=20
 >> Solaris-specific
 >>   > code up above this test.
 >>     No, these functions do return what the authors think they do.
 >>   The check is for KVA usage (kernel virtual address space), not for=20
 >> physical memory.
 > I understand, but that's nonsensical in the context of the Solaris=20
 > code.  "lotsfree" is *not* a declaration of free kvm space, it's a=20
 > declaration of when the system has "lots" of free *physical* memory.
 >
 > Further it makes no sense at all to allow the ARC cache to force=20
 > things into virtual (e.g. swap-space backed) memory.  But that's the=20
 > behavior that has been observed, and it fits with the code as=20
 > originally written.
 >
 >>     > The result is that the cache only shrinks when=20
 >> vm_paging_needed() tests
 >>   > true, but by that time the system is in serious memory trouble=20
 >> and by
 >>     No, it is not.
 >>   The description and numbers here are a little bit outdated but they =

 >> should give
 >>   an idea of how paging works in general:
 >>   https://wiki.freebsd.org/AvgPageoutAlgorithm
 >>     > triggering only there it actually drives the system further=20
 >> into paging,
 >>     How does ARC eviction drives the system further into paging?
 > 1. System gets low on physical memory but the ARC cache is looking at=20
 > available kvm (of which there is plenty.)  The ARC cache continues to=20
 > expand.
 >
 > 2. vm_paging_needed() returns true and the system begins to page off=20
 > to the swap.  At the same time the ARC cache is pared down because=20
 > arc_reclaim_needed has returned "1".
 >
 > 3. As the ARC cache shrinks and paging occurs vm_paging_needed()=20
 > returns false.  Paging out ceases but inactive pages remain on the=20
 > swap.  They are not recalled until and unless they are scheduled to=20
 > execute.  Arc_reclaim_needed again returns "0".
 >
 > 4. The hold-down timer expires in the ARC cache code=20
 > ("arc_grow_retry", declared as 60 seconds) and the ARC cache begins to =

 > expand again.
 >
 > Go back to #2 until the system's performance starts to deteriorate=20
 > badly enough due to the paging that you notice it, which occurs when=20
 > something that is actually consuming CPU time has to be called in from =

 > swap.
 >
 > This is consistent with what I and others have observed on both 9.2=20
 > and 10.0; the ARC will expand until it hits the maximum configured=20
 > even at the expense of forcing pages onto the swap.  In this specific=20
 > machine's case left to defaults it will grab nearly all physical=20
 > memory (over 20GB of 24) and wire it down.
 >
 > Limiting arc_max to 16GB sorta fixes it.  I say "sorta" because it=20
 > turns out that 16GB is still too much for the workload; it prevents=20
 > the pathological behavior where system "stalls" happen but only in the =

 > extreme.  It turns out with the patch in my ARC cache stabilizes at=20
 > about 13.5GB during the busiest part of the day, growing to about 16=20
 > off-hours.
 >
 > One of the problems with just limiting it in /boot/loader.conf is that =

 > you have to guess and the system doesn't reasonably adapt to changing=20
 > memory loads.  The code is clearly intended to do that but it doesn't=20
 > end up working that way in practice.
 >>     > because the pager will not recall pages from the swap until=20
 >> they are next
 >>   > executed. This leads the ARC to try to fill in all the available=20
 >> RAM even
 >>   > though pages have been pushed off onto swap. Not good.
 >>     Unused physical memory is a waste.  It is true that ARC tries to=20
 >> use as much of
 >>   memory as it is allowed.  The same applies to the page cache=20
 >> (Active, Inactive).
 >>   Memory management is a dynamic system and there are a few competing =

 >> agents.
 > That's true.  However, what the stock code does is force working set=20
 > out of memory and into the swap.  The ideal situation is one in which=20
 > there is no free memory because cache has sized itself to consume=20
 > everything *not* necessary for the working set of the processes that=20
 > are running.  Unfortunately we cannot determine this presciently=20
 > because a new process may come along and we do not necessarily know=20
 > for how long a process that is blocked on an event will remain blocked =

 > (e.g. something waiting on network I/O, etc.)
 >
 > However, it is my contention that you do not want to evict a process=20
 > that is scheduled to run (or is going to be) in favor of disk cache=20
 > because you're defeating yourself by doing so.  The point of the disk=20
 > cache is to avoid going to the physical disk for I/O, but if you page=20
 > something you have ditched a physical I/O for data in favor of having=20
 > to go to physical disk *twice* -- first to write the paged-out data to =

 > swap, and then to retrieve it when it is to be executed.  This also=20
 > appears to be consistent with what is present for Solaris machines.
 >
 > From the Sun code:
 >
 > #ifdef sun
 >         /*
 >          * take 'desfree' extra pages, so we reclaim sooner, rather=20
 > than later
 >          */
 >         extra =3D desfree;
 >
 >         /*
 >          * check that we're out of range of the pageout scanner. It=20
 > starts to
 >          * schedule paging if freemem is less than lotsfree and needfre=
 e.
 >          * lotsfree is the high-water mark for pageout, and needfree=20
 > is the
 >          * number of needed free pages.  We add extra pages here to=20
 > make sure
 >          * the scanner doesn't start up while we're freeing memory.
 >          */
 >         if (freemem < lotsfree + needfree + extra)
 >                 return (1);
 >
 >         /*
 >          * check to make sure that swapfs has enough space so that anon=

 >          * reservations can still succeed. anon_resvmem() checks that t=
 he
 >          * availrmem is greater than swapfs_minfree, and the number of =

 > reserved
 >          * swap pages.  We also add a bit of extra here just to prevent=

 >          * circumstances from getting really dire.
 >          */
 >         if (availrmem < swapfs_minfree + swapfs_reserve + extra)
 >                 return (1);
 >
 > "freemem" is not virtual memory, it's actual memory.  "Lotsfree" is=20
 > the point where the system considers free RAM to be "ample";=20
 > "needfree" is the "desperation" point and "extra" is the margin=20
 > (presumably for image activation.)
 >
 > The base code on FreeBSD doesn't look at physical memory at all; it=20
 > looks at kvm space instead.
 >
 >>   It is hard to correctly tune that system using a large hummer such=20
 >> as your
 >>   patch.  I believe that with your patch ARC will get shrunk to its=20
 >> minimum size
 >>   in due time.  Active + Inactive will grow to use the memory that=20
 >> you are denying
 >>   to ARC driving Free below a threshold, which will reduce ARC.=20
 >> Repeated enough
 >>   times this will drive ARC to its minimum.
 > I disagree both in design theory and based on the empirical evidence=20
 > of actual operation.
 >
 > First, I don't (ever) want to give memory to the ARC cache that=20
 > otherwise would go to "active", because any time I do that I'm going=20
 > to force two page events, which is double the amount of I/O I would=20
 > take on a cache *miss*, and even with the ARC at minimum I get a=20
 > reasonable hit percentage.  If I therefore prefer ARC over "active"=20
 > pages I am going to take *at least* a 200% penalty on physical I/O and =

 > if I get an 80% hit ratio with the ARC at a minimum the penalty is=20
 > closer to 800%!
 >
 > For inactive pages it's a bit more complicated as those may not be=20
 > reactivated.  However, I am trusting FreeBSD's VM subsystem to demote=20
 > those that are unlikely to be reactivated to the cache bucket and then =

 > to "free", where they are able to be re-used. This is consistent with=20
 > what I actually see on a running system -- the "inact" bucket is=20
 > typically fairly large (often on a busy machine close to that of=20
 > "active") but pages demoted to "cache" don't stay there long - they=20
 > either get re-promoted back up or they are freed and go on the free lis=
 t.
 >
 > The only time I see "inact" get out of control is when there's a=20
 > kernel memory leak somewhere (such as what I ran into the other day=20
 > with the in-kernel NAT subsystem on 10-STABLE.)  But that's a bug and=20
 > if it happens you're going to get bit anyway.
 >
 > For example right now on one of my very busy systems with 24GB of=20
 > installed RAM and many terabytes of storage across three ZFS pools I'm =

 > seeing 17GB wired of which 13.5 is ARC cache.  That's the adaptive=20
 > figure it currently is running at, with a maximum of 22.3 and a=20
 > minimum of 2.79 (8:1 ratio.)  The remainder is wired down for other=20
 > reasons (there's a fairly large Postgres server running on that box,=20
 > among other things, and it has a big shared buffer declaration --=20
 > that's most of the difference.)  Cache hit efficiency is currently 97.8=
 %.
 >
 > Active is 2.26G right now, and inactive is 2.09G.  Both are stable.=20
 > Overnight inactive will drop to about 1.1GB while active will not=20
 > change all that much since most of it postgres and the middleware that =

 > talks to it along with apache, which leaves most of its processes=20
 > present even when they go idle.  Peak load times are about right now=20
 > (mid-day), and again when the system is running backups nightly.
 >
 > Cache is 7448, in other words, insignificant.  Free memory is 2.6G.
 >
 > The tunable is set to 10%, which is almost exactly what free memory=20
 > is.  I find that when the system gets under 1G free transient image=20
 > activation can drive it into paging and performance starts to suffer=20
 > for my particular workload.
 >
 >>     Also, there are a few technical problems with the patch:
 >>   - you don't need to use sysctl interface in kernel, the values you=20
 >> need are
 >>   available directly, just take a look at e.g. implementation of=20
 >> vm_paging_needed()
 > That's easily fixed.  I will look at it.
 >>   - similarly, querying vfs.zfs.arc_freepage_percent_target value via
 >>   kernel_sysctlbyname is just bogus; you can use percent_target direct=
 ly
 > I did not know if during setup of the OID the value was copied (and=20
 > thus you had to reference it later on) or the entry simply took the=20
 > pointer and stashed that.  Easily corrected.
 >>   - you don't need to sum various page counters to get a total count, =

 >> there is
 >>   v_page_count
 > Fair enough as well.
 >>   Lastly, can you try to test reverting your patch and instead setting=

 >>   vm.lowmem_period=3D0 ?
 > Yes.  By default it's 10; I have not tampered with that default.
 >
 > Let me do a bit of work and I'll post back with a revised patch.=20
 > Perhaps a tunable for percentage free + a free reserve that is a=20
 > "floor"?  The problem with that is where to put the defaults.  One=20
 > option would be to grab total size at init time and compute something=20
 > similar to what "lotsfree" is for Solaris, allowing that to be tuned=20
 > with the percentage if desired.  I selected 25% because that's what=20
 > the original test was expressing and it should be reasonable for=20
 > modest RAM configurations.  It's clearly too high for moderately large =

 > (or huge) memory machines unless they have a lot of RAM -hungry=20
 > processes running on them.
 >
 > The percentage test, however, is an easy knob to twist that is=20
 > unlikely to severely harm you if you dial it too far in either=20
 > direction; anyone setting it to zero obviously knows what they're=20
 > getting into, and if you crank it too high all you end up doing is=20
 > limiting the ARC to the minimum value.
 >

 Responsive to the criticisms and in an attempt to better-track what the=20
 VM system does, I offer this update to the patch.  The following changes =

 have been made:

 1. There are now two tunables:
 vfs.zfs.arc_freepages -- the number of free pages below which we declare =

 low memory and ask for ARC paring.
 vfs.zfs.arc_freepage_percent -- the additional free RAM to reserve in=20
 percent of total, if any (added to freepages)

 2. vfs.zfs.arc_freepages, if zero (as is the default at boot), defaults=20
 to "vm.stats.vm.v_free_target" less 20%.  This allows the system to get=20
 into the page-stealing paradigm before the ARC cache is invaded.  While=20
 I do not run into a situation of unbridled inact page growth here the=20
 criticism that the original patch could allow this appears to be=20
 well-founded.  Setting the low memory alert here should prevent this, as =

 the system will now allow the ARC to grow to the point that=20
 page-stealing takes place.

 3. The previous option to reserve either a hard amount of RAM or a=20
 percentage of RAM remains.

 4. The defaults should auto-tune for any particular RAM configuration to =

 reasonable values that prevent stalls, yet if you have circumstances=20
 that argue for reserving more memory you may do so.

 Updated patch follows:

 *** arc.c.original	Thu Mar 13 09:18:48 2014
 --- arc.c	Wed Mar 19 07:44:01 2014
 ***************
 *** 18,23 ****
 --- 18,99 ----
     *
     * CDDL HEADER END
     */
 +
 + /* Karl Denninger (karl at denninger.net), 3/18/2014, FreeBSD-specific
 +  *
 +  * If "NEWRECLAIM" is defined, change the "low memory" warning that cau=
 ses
 +  * the ARC cache to be pared down.  The reason for the change is that t=
 he
 +  * apparent attempted algorithm is to start evicting ARC cache when fre=
 e
 +  * pages fall below 25% of installed RAM.  This maps reasonably well to=
  how
 +  * Solaris is documented to behave; when "lotsfree" is invaded ZFS is t=
 old
 +  * to pare down.
 +  *
 +  * The problem is that on FreeBSD machines the system doesn't appear to=
  be
 +  * getting what the authors of the original code thought they were look=
 ing at
 +  * with its test -- or at least not what Solaris did -- and as a result=
  that
 +  * test never triggers.  That leaves the only reclaim trigger as the "p=
 aging
 +  * needed" status flag, and by the time * that trips the system is alre=
 ady
 +  * in low-memory trouble.  This can lead to severe pathological behavio=
 r
 +  * under the following scenario:
 +  * - The system starts to page and ARC is evicted.
 +  * - The system stops paging as ARC's eviction drops wired RAM a bit.
 +  * - ARC starts increasing its allocation again, and wired memory grows=
 =2E
 +  * - A new image is activated, and the system once again attempts to pa=
 ge.
 +  * - ARC starts to be evicted again.
 +  * - Back to #2
 +  *
 +  * Note that ZFS's ARC default (unless you override it in /boot/loader.=
 conf)
 +  * is to allow the ARC cache to grab nearly all of free RAM, provided n=
 obody
 +  * else needs it.  That would be ok if we evicted cache when required.
 +  *
 +  * Unfortunately the system can get into a state where it never
 +  * manages to page anything of materiality back in, as if there is acti=
 ve
 +  * I/O the ARC will start grabbing space once again as soon as the memo=
 ry
 +  * contention state drops.  For this reason the "paging is occurring" f=
 lag
 +  * should be the **last resort** condition for ARC eviction; you want t=
 o
 +  * (as Solaris does) start when there is material free RAM left BUT the=

 +  * vm system thinks it needs to be active to steal pages back in the at=
 tempt
 +  * to never get into the condition where you're potentially paging off
 +  * executables in favor of leaving disk cache allocated.
 +  *
 +  * To fix this we change how we look at low memory, declaring two new
 +  * runtime tunables.
 +  *
 +  * The new sysctls are:
 +  * vfs.zfs.arc_freepages (free pages required to call RAM "sufficient")=

 +  * vfs.zfs.arc_freepage_percent (additional reservation percentage, def=
 ault 0)
 +  *
 +  * vfs.zfs.arc_freepages is initialized from vm.stats.vm.v_free_target,=

 +  * less 20% if we find that it is zero.  Note that vm.stats.vm.v_free_t=
 arget
 +  * is not initialized at boot -- the system has to be running first, so=
  we
 +  * cannot initialize this in arc_init.  So we check during runtime; thi=
 s
 +  * also allows the user to return to defaults by setting it to zero.
 +  *
 +  * This should insure that we allow the VM system to steal pages first,=

 +  * but pare the cache before we suspend processes attempting to get mor=
 e
 +  * memory, thereby avoiding "stalls."  You can set this higher if you w=
 ish,
 +  * or force a specific percentage reservation as well, but doing so may=

 +  * cause the cache to pare back while the VM system remains willing to
 +  * allow "inactive" pages to accumulate.  The challenge is that image
 +  * activation can force things into the page space on a repeated basis
 +  * if you allow this level to be too small (the above pathological
 +  * behavior); the defaults should avoid that behavior but the sysctls
 +  * are exposed should your workload require adjustment.
 +  *
 +  * If we're using this check for low memory we are replacing the previo=
 us
 +  * ones, including the oddball "random" reclaim that appears to fire fa=
 r
 +  * more often than it should.  We still trigger if the system pages.
 +  *
 +  * If you turn on NEWRECLAIM_DEBUG then the kernel will print on the co=
 nsole
 +  * status messages when the reclaim status trips on and off, along with=
  the
 +  * page count aggregate that triggered it (and the free space) for each=

 +  * event.
 +  */
 +
 + #define	NEWRECLAIM
 + #undef	NEWRECLAIM_DEBUG
 +
 +
    /*
     * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights =
 reserved.
     * Copyright (c) 2013 by Delphix. All rights reserved.
 ***************
 *** 139,144 ****
 --- 215,226 ----
   =20
    #include <vm/vm_pageout.h>
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef	__FreeBSD__
 + #include <sys/sysctl.h>
 + #endif
 + #endif	/* NEWRECLAIM */
 +
    #ifdef illumos
    #ifndef _KERNEL
    /* set with ZFS_DEBUG=3Dwatch, to enable watchpoints on frozen buffers=
  */
 ***************
 *** 203,218 ****
 --- 285,320 ----
    int zfs_arc_shrink_shift =3D 0;
    int zfs_arc_p_min_shift =3D 0;
    int zfs_disable_dup_eviction =3D 0;
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + static	int freepages =3D 0;	/* This much memory is considered critical =
 */
 + static	int percent_target =3D 0;	/* Additionally reserve "X" percent fr=
 ee RAM */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    TUNABLE_QUAD("vfs.zfs.arc_max", &zfs_arc_max);
    TUNABLE_QUAD("vfs.zfs.arc_min", &zfs_arc_min);
    TUNABLE_QUAD("vfs.zfs.arc_meta_limit", &zfs_arc_meta_limit);
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + TUNABLE_INT("vfs.zfs.arc_freepages", &freepages);
 + TUNABLE_INT("vfs.zfs.arc_freepage_percent", &percent_target);
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    SYSCTL_DECL(_vfs_zfs);
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_max, CTLFLAG_RDTUN, &zfs_arc_max,=
  0,
        "Maximum ARC size");
    SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min, CTLFLAG_RDTUN, &zfs_arc_min,=
  0,
        "Minimum ARC size");
   =20
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepages, CTLFLAG_RWTUN, &freepages=
 , 0, "ARC Free RAM Pages Required");
 + SYSCTL_INT(_vfs_zfs, OID_AUTO, arc_freepage_percent, CTLFLAG_RWTUN, &pe=
 rcent_target, 0, "ARC Free RAM Target percentage");
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    /*
     * Note that buffers can be in one of 6 states:
     *	ARC_anon	- anonymous (discussed below)
 ***************
 *** 2438,2443 ****
 --- 2540,2557 ----
    {
   =20
    #ifdef _KERNEL
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + 	u_int	vmfree =3D 0;
 + 	u_int	vmtotal =3D 0;
 + 	size_t	vmsize;
 + #ifdef	NEWRECLAIM_DEBUG
 + 	static	int	xval =3D -1;
 + 	static	int	oldpercent =3D 0;
 + 	static	int	oldfreepages =3D 0;
 + #endif	/* NEWRECLAIM_DEBUG */
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
   =20
    	if (needfree)
    		return (1);
 ***************
 *** 2476,2481 ****
 --- 2590,2596 ----
    		return (1);
   =20
    #if defined(__i386)
 +
    	/*
    	 * If we're on an i386 platform, it's possible that we'll exhaust the=

    	 * kernel heap space before we ever run out of available physical
 ***************
 *** 2492,2502 ****
    		return (1);
    #endif
    #else	/* !sun */
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
 - #else
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif
 --- 2607,2680 ----
    		return (1);
    #endif
    #else	/* !sun */
 +
 + #ifdef	NEWRECLAIM
 + #ifdef  __FreeBSD__
 + /*
 +  * Implement the new tunable free RAM algorithm.  We check the free pag=
 es
 +  * against the minimum specified target and the percentage that should =
 be
 +  * free.  If we're low we ask for ARC cache shrinkage.  If this is defi=
 ned
 +  * on a FreeBSD system the older checks are not performed.
 +  *
 +  * Check first to see if we need to init freepages, then test.
 +  */
 + 	if (!freepages) {		/* If zero then (re)init */
 + 		vmsize =3D sizeof(vmtotal);
 + 		kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_target", &vmtotal,=
  &vmsize, NULL, 0, NULL, 0);
 + 		freepages =3D vmtotal - (vmtotal / 5);
 + #ifdef	NEWRECLAIM_DEBUG
 + 		printf("ZFS ARC: Default vfs.zfs.arc_freepages to [%u] [%u less 20%%]=
 \n", freepages, vmtotal);
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	}
 +
 + 	vmsize =3D sizeof(vmtotal);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_page_count", &vmt=
 otal, &vmsize, NULL, 0, NULL, 0);
 + 	vmsize =3D sizeof(vmfree);
 +         kernel_sysctlbyname(curthread, "vm.stats.vm.v_free_count", &vmf=
 ree, &vmsize, NULL, 0, NULL, 0);
 + #ifdef	NEWRECLAIM_DEBUG
 + 	if (percent_target !=3D oldpercent) {
 + 		printf("ZFS ARC: Reservation percent change to [%d], [%d] pages, [%d]=
  free\n", percent_target, vmtotal, vmfree);
 + 		oldpercent =3D percent_target;
 + 	}
 + 	if (freepages !=3D oldfreepages) {
 + 		printf("ZFS ARC: Low RAM page change to [%d], [%d] pages, [%d] free\n=
 ", freepages, vmtotal, vmfree);
 + 		oldfreepages =3D freepages;
 + 	}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 	if (!vmtotal) {
 + 		vmtotal =3D 1;	/* Protect against divide by zero */
 + 				/* (should be impossible, but...) */
 + 	}
 + /*
 +  * Now figure out how much free RAM we require to call the ARC cache st=
 atus
 +  * "ok".  Add the percentage specified of the total to the base require=
 ment.
 +  */
 +
 + 	if (vmfree < freepages + ((vmtotal / 100) * percent_target)) {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 1) {
 + 			printf("ZFS ARC: RECLAIM total %u, free %u, free pct (%u), reserved =
 (%u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fr=
 eepages, percent_target);
 + 			xval =3D 1;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(1);
 + 	} else {
 + #ifdef	NEWRECLAIM_DEBUG
 + 		if (xval !=3D 0) {
 + 			printf("ZFS ARC: NORMAL total %u, free %u, free pct (%u), reserved (=
 %u), target pct (%u)\n", vmtotal, vmfree, ((vmfree * 100) / vmtotal), fre=
 epages, percent_target);
 + 			xval =3D 0;
 + 		}
 + #endif	/* NEWRECLAIM_DEBUG */
 + 		return(0);
 + 	}
 +
 + #endif	/* __FreeBSD__ */
 + #endif	/* NEWRECLAIM */
 +
    	if (kmem_used() > (kmem_size() * 3) / 4)
    		return (1);
    #endif	/* sun */
   =20
    	if (spa_get_random(100) =3D=3D 0)
    		return (1);
    #endif

 --=20
 -- Karl
 karl at denninger.net

 --------------ms010701070402040604030408
 Content-Type: application/pkcs7-signature; name="smime.p7s"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment; filename="smime.p7s"
 Content-Description: S/MIME Cryptographic Signature

 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIFTzCC
 BUswggQzoAMCAQICAQgwDQYJKoZIhvcNAQEFBQAwgZ0xCzAJBgNVBAYTAlVTMRAwDgYDVQQI
 EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
 TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkqhkiG9w0BCQEWIGN1c3Rv
 bWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0MB4XDTEzMDgyNDE5MDM0NFoXDTE4MDgyMzE5
 MDM0NFowWzELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExFzAVBgNVBAMTDkthcmwg
 RGVubmluZ2VyMSEwHwYJKoZIhvcNAQkBFhJrYXJsQGRlbm5pbmdlci5uZXQwggIiMA0GCSqG
 SIb3DQEBAQUAA4ICDwAwggIKAoICAQC5n2KBrBmG22nVntVdvgKCB9UcnapNThrW1L+dq6th
 d9l4mj+qYMUpJ+8I0rTbY1dn21IXQBoBQmy8t1doKwmTdQ59F0FwZEPt/fGbRgBKVt3Quf6W
 6n7kRk9MG6gdD7V9vPpFV41e+5MWYtqGWY3ScDP8SyYLjL/Xgr+5KFKkDfuubK8DeNqdLniV
 jHo/vqmIgO+6NgzPGPgmbutzFQXlxUqjiNAAKzF2+Tkddi+WKABrcc/EqnBb0X8GdqcIamO5
 SyVmuM+7Zdns7D9pcV16zMMQ8LfNFQCDvbCuuQKMDg2F22x5ekYXpwjqTyfjcHBkWC8vFNoY
 5aFMdyiN/Kkz0/kduP2ekYOgkRqcShfLEcG9SQ4LQZgqjMpTjSOGzBr3tOvVn5LkSJSHW2Z8
 Q0dxSkvFG2/lsOWFbwQeeZSaBi5vRZCYCOf5tRd1+E93FyQfpt4vsrXshIAk7IK7f0qXvxP4
 GDli5PKIEubD2Bn+gp3vB/DkfKySh5NBHVB+OPCoXRUWBkQxme65wBO02OZZt0k8Iq0i4Rci
 WV6z+lQHqDKtaVGgMsHn6PoeYhjf5Al5SP+U3imTjF2aCca1iDB5JOccX04MNljvifXgcbJN
 nkMgrzmm1ZgJ1PLur/ADWPlnz45quOhHg1TfUCLfI/DzgG7Z6u+oy4siQuFr9QT0MQIDAQAB
 o4HWMIHTMAkGA1UdEwQCMAAwEQYJYIZIAYb4QgEBBAQDAgWgMAsGA1UdDwQEAwIF4DAsBglg
 hkgBhvhCAQ0EHxYdT3BlblNTTCBHZW5lcmF0ZWQgQ2VydGlmaWNhdGUwHQYDVR0OBBYEFHw4
 +LnuALyLA5Cgy7T5ZAX1WzKPMB8GA1UdIwQYMBaAFF3U3hpBZq40HB5VM7B44/gmXiI0MDgG
 CWCGSAGG+EIBAwQrFilodHRwczovL2N1ZGFzeXN0ZW1zLm5ldDoxMTQ0My9yZXZva2VkLmNy
 bDANBgkqhkiG9w0BAQUFAAOCAQEAZ0L4tQbBd0hd4wuw/YVqEBDDXJ54q2AoqQAmsOlnoxLO
 31ehM/LvrTIP4yK2u1VmXtUumQ4Ao15JFM+xmwqtEGsh70RRrfVBAGd7KOZ3GB39FP2TgN/c
 L5fJKVxOqvEnW6cL9QtvUlcM3hXg8kDv60OB+LIcSE/P3/s+0tEpWPjxm3LHVE7JmPbZIcJ1
 YMoZvHh0NSjY5D0HZlwtbDO7pDz9sZf1QEOgjH828fhtborkaHaUI46pmrMjiBnY6ujXMcWD
 pxtikki0zY22nrxfTs5xDWGxyrc/cmucjxClJF6+OYVUSaZhiiHfa9Pr+41okLgsRB0AmNwE
 f6ItY3TI8DGCBQowggUGAgEBMIGjMIGdMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxvcmlk
 YTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRwwGgYD
 VQQDExNDdWRhIFN5c3RlbXMgTExDIENBMS8wLQYJKoZIhvcNAQkBFiBjdXN0b21lci1zZXJ2
 aWNlQGN1ZGFzeXN0ZW1zLm5ldAIBCDAJBgUrDgMCGgUAoIICOzAYBgkqhkiG9w0BCQMxCwYJ
 KoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNDAzMTkxNDE4NDBaMCMGCSqGSIb3DQEJBDEW
 BBQei71KWp0Us3DWHQWNCkeF3NMHRjBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFlAwQBKjAL
 BglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFA
 MAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIG0BgkrBgEEAYI3EAQxgaYwgaMwgZ0xCzAJBgNV
 BAYTAlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoT
 EEN1ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExLzAtBgkq
 hkiG9w0BCQEWIGN1c3RvbWVyLXNlcnZpY2VAY3VkYXN5c3RlbXMubmV0AgEIMIG2BgsqhkiG
 9w0BCRACCzGBpqCBozCBnTELMAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNV
 BAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3Vk
 YSBTeXN0ZW1zIExMQyBDQTEvMC0GCSqGSIb3DQEJARYgY3VzdG9tZXItc2VydmljZUBjdWRh
 c3lzdGVtcy5uZXQCAQgwDQYJKoZIhvcNAQEBBQAEggIAYM3wX4zcQ6slDGipG999HQbbYlLY
 wEaJRr1wTMOUoP+KPdDpxP9hJ6lOJYbiaM98HM1mSjxEvyX6kydwbKvV9QKVld7dliA2+pTy
 yH7ZlVdVKgtYWH6J03fjyIIdZaFHpAfSVmHeoNxKvgVZ27ur0cLs5VG+BcOeW37Jctenhidf
 H2XMs5DgCQMcn2ZcUqM7ncq3zPQu5K3afxcrmFhkrvKoeUgiLnZtERGHKClhdhQHthOGjaPa
 WShUih/yJoDcsEeuOOio4wQ3mM7DIwvn2F4B/hL90NIM0VLW95NyeJJ2TjbMa8kQ2tSv+PC3
 NPXNCJRv6wONUT3i+U+9Dl69sJVrLmfXku+vbXFb7VirsEN7WP8x7ABX6TA3WIDNTy+RMcMx
 EmYim5pmLId5h3s72b48vR/ptwPrAmxrQOaLPt5kKkRxZ4D4uTQb0+XPtAFJKEhGCQyEQ86n
 4b7Kzskoucm2UWx78uMUPD6eSiWdvv0AtnkYULhnPAErNz2t1hnpmsJK23dDZQfyIYRDxc8Q
 3UZX2KVyyD/gnq3G3JNDj5zayedh2f08bCPKBqoYUbWnhY0rtkyCdWaL3zz+CXGqnT8Kp/wF
 Uan14xdvVyETg6xXOLxFAYIj16nXS/gjWm45oyhEGlT0GcKCBcjK8V46KuXqwqZ1k5ojKWYV
 AfZPB7YAAAAAAAA=
 --------------ms010701070402040604030408--