ZFS arc_reclaim_needed: better cooperation with pagedaemon

Mon Aug 23 00:14:26 UTC 2010

Do you by any chance have a graph showing kstat.zfs.misc.arcstats.size
behavior in addition to the stuff included on your graphs now?  All I
can tell from your graphs is that v_free_count+v_cache_count shifted a
bit lower relative to v_free_target+v_cache_min. It would be
interesting to see what effect your patch has on ARC itself,
especially when ARC will start giving up memory and when does it stop
shrinking.

--Artem

On Sun, Aug 22, 2010 at 2:46 PM, Andriy Gapon <avg at freebsd.org> wrote:
>
> I propose that the following code in arc_reclaim_needed
> (sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c)
> /*
>  * If pages are needed or we're within 2048 pages
>  * of needing to page need to reclaim
>  */
> if (vm_pages_needed || (vm_paging_target() > -2048))
>
> be changed to
>
> if (vm_paging_needed())
>
> Rationale.
>
> 1. Why not current checks.
>
> ARC sizing should cooperate with pagedaemon in freeing pages.
> If ARC starts shrinking "prematurely", before pagedaemon is waked up then no
> potentially eligible inactive pages will be recycled and no potentially eligible
> active pages will be inactive (subject to v_inactive_target).
> This would lead to ARC size going to its minimum value (which could hurt ZFS
> performance).  Only after that there is a chance that pagedaemon would be waked
> up to do its cleaning.
> And conversely, if ARC doesn't shrink in time, then pagedaemon would have to
> recycle pages with data that could be needed again soon and that would lead to
> excessive swapping and disk I/O.
>
> vm_paging_target() is used only by pagedaemon internally, it effectively sets
> _upper_ limit on how many pages pagedaemon would free when it's activated.
> It is no indication of whether pagedaemon should be scanning/freeing pages.
> Thus check of vm_paging_target() leads to premature ARC shrinking.
> I believe that many people observe this behavior on sufficiently active systems
> (not dedicated file servers) with few GB of RAM (1-8).
>
> vm_pages_needed check is redundant, because this is a flag that is used to wake
> up pagedaemon.  So when it is set vm_paging_needed() is true and
> vm_paging_target() is "way" above zero.  And this flag is reset to zero when
> vm_page_count_min() becomes false, which corresponds to even fewer free pages
> than when vm_paging_needed() is true.
>
>
> 2. Why the new check.
>
> vm_paging_needed() is the (earliest) condition that wakes up pagedaemon (see
> vm_page_alloc).  pagedaemon would first of all run vm_lowmem event for which ARC
> already has a handler and which would cause ARC size to shrink.
> It would seems like having vm_paging_needed() check would be redundant then.
> Almost - if memory pressure is significant, then vm_paging_needed() may stay
> true for a while and that would cause additional ARC reduction by
> arc_reclaim_thread.
>
>
> Final notes.
>
> I think that
> vm_paging_target() > -2048
> check was modeled after the check in the original OpenSolaris code:
> freemem < lotsfree + needfree + extra
>
> The issue is that in my understanding OpenSolaris pagedaemon works differently
> from FreeBSD pagedaemon.
>
> OpenSolaris pagedaemon is activated when freemem (equivalent of our free +
> cache) falls down to a certain higher mark (lotsfree).  Initially it scans pages
> at a slow rate.  If freemem falls further the rate linearly increases until it
> reaches its maximum when freemem goes to or below certain lower mark.
>
> Our pagedaemon is activated when free + cache falls down to a value when
> vm_paging_needed() is true (see definition of this function).  When it is
> activated it makes a scan pass though inactive and active pages setting a
> certain target for free+cache, but that target is "soft" and actually is an
> upper limit of how many pages could be freed during the pass. pagedaemon would
> make the second (or subsequent) pass only if free+cache falls to value that is
> even lower than the threshold in vm_paging_needed(), which means significant
> (severe even) memory pressure/shortage.
> So on sufficiently active system free+cache would typically oscillate between
> v_free_reserved+v_cache_min at the bottom and some semi-random values "near"
> v_free_target+v_cache_min at the tops.  That's when excluding ARC from the picture.
>
> And about pictures :-)
> Behavior of free+cache with current arc_reclaim_needed code:
> http://people.freebsd.org/~avg/avail-mem-before.png
> and its behavior after the patch:
> http://people.freebsd.org/~avg/avail-mem-after.png
>
> The legends on the pictures are incorrect, sorry, my mastery of drraw is not
> good yet.
> Correct legends:
> "aqua" color - v_free_target+v_cache_min (vm_paging_target() == 0)
> "fuchsia" color - v_free_reserved+v_cache_min (vm_paging_needed() threshold)
> "lime" color - v_free_count+v_cache_count indeed :)
> Y axis - % of total page count.
>
> I think the graphs speak for themselves.
>
> --
> Andriy Gapon
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>