svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

Fri Aug 29 20:28:05 UTC 2014

----- Original Message ----- 
From: "Steven Hartland" <smh at freebsd.org>

> ----- Original Message ----- 
> From: "Alan Cox" <alc at rice.edu>
>
>> You didn't really address Peter's initial technical issue.  Peter
>> correctly observed that cache pages are just another flavor of free
>> pages.  Whenever the VM system is checking the number of free pages
>> against any of the thresholds, it always uses the sum of 
>> v_cache_count
>> and v_free_count.  So, to anyone familiar with the VM system, like
>> Peter, what you've done, which is to derive a threshold from
>> v_free_target but only compare v_free_count to that threshold, looks
>> highly suspect.
>>
>> That said, I can easily believe that your patch works better than the
>> existing code, because it is closer in spirit to my interpretation of
>> what the Solaris code does.  Specifically, I believe that the Solaris
>> code starts trimming the ARC before the Solaris page daemon starts
>> writing dirty pages to secondary storage.  Now, you've made FreeBSD 
>> do
>> the same.  However, you've expressed it in a way that looks broken.
>>
>> To wrap up, I think that you can easily write this in a way that
>> simultaneously behaves like Solaris and doesn't look wrong to a VM 
>> expert.
>
> More details in my last reply on this but in short it seems this has
> already been tried and it didn't work.
>
> I'd be interested in what domain experts think about why that is?
>
> In the mean time I've asked Karl to see if he could retest with
> this change to confirm counting cache pages along with free does
> indeed still cause a problem.
>
> For those that want to catch up on what has already tested see the
> original PR:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

Copying in Karl's response from the PR for easy access:

I can give it a shot on my test system but I will note that for my 
production
machine right now that would be a no-op as there are no cache pages in 
use.
Getting that on the production system with a high level of load over the
holiday is rather unlikely due to it being Labor Day, so all I will have 
are my
synthetics.  I do have some time over the long weekend to run that test 
and
should be able to do so if you think it's useful, but I am wary of that
approach being correct in the general case due to my previous 
experience.

My commentary on that discussion point and reasoning originally is here
(http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019084.html); 
along
that thread you'll see a vmstat output with the current paradigm showing 
that
cache doesn't grow without boundary, as many suggested it would if I 
didn't
include those pages as "free" and thus available for ARC to attempt to 
invade
(effectively trying to force the VM system to evict them from the cache 
list by
having it wake up first.)  Indeed, up at comment #31 is the patch on 
that
production system that has been up for about four months and you can see 
only a
few pages are in the cached state.  That is fairly typical.

What is the expected reason for concern about including them in the free 
count
for ARC (when they're not in terms of what wakes up the VM system as a 
whole.)
If the expected area of concern is that cache pages will grow without 
boundary
(or close to it) the evidence from both my production machine and others
strongly suggests that's incorrect.

My work suggests strongly that the most-likely to be correct behavior 
for the
majority of workloads is achieved when both the ARC paring routine and 
VM
page-cleaning system wake up at the same time.  That occurs when
vm_cnt.v_free_count is invaded.  Attempting to bias the outcome to force 
either
the VM system or the ARC cache to do something first appears to increase 
the
circumstances under which bad behavior occurs.