svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm
smh at freebsd.org
Fri Aug 29 19:57:33 UTC 2014
> On Friday 29 August 2014 11:54:42 Alan Cox wrote:
> > > Others have also confirmed that even with r265945 they can still
> > > trigger
> > > performance issue.
> > >
> > > In addition without it we still have loads of RAM sat their
> > > unused, in my
> > > particular experience we have 40GB of 192GB sitting their unused
> > > and that
> > > was with a stable build from last weekend.
> > The Solaris code only imposed this limit on 32-bit machines where
> > the
> > available kernel virtual address space may be much less than the
> > available physical memory. Previously, FreeBSD imposed this limit
> > on
> > both 32-bit and 64-bit machines. Now, it imposes it on neither.
> > Why
> > continue to do this differently from Solaris?
My understanding is these limits where totally different on Solaris see
#ifdef sun block in arc_reclaim_needed() for details. I actually started
matching the Solaris flow but this had already been tested and proved
to work as well as the current design.
> Since the question was asked below, we don't have zfs machines in the
> running i386. We can barely get them to boot as it is due to kva
> We have to reduce/cap physical memory and change the user/kernel
> virtual split
> from 3:1 to 2.5:1.5.
> We do run zfs on small amd64 machines with 2G of ram, but I can't
> imagine it
> working on the 10G i386 PAE machines that we have.
> > > With the patch we confirmed that both RAM usage and performance
> > > for those
> > > seeing that issue are resolved, with no reported regressions.
> > >
> > >> (I should know better than to fire a reply off before full fact
> > >> checking, but
> > >> this commit worries me..)
> > >
> > > Not a problem, its great to know people pay attention to changes,
> > > and
> > > raise
> > > their concerns. Always better to have a discussion about potential
> > > issues
> > > than to wait for a problem to occur.
> > >
> > > Hopefully the above gives you some piece of mind, but if you still
> > > have any
> > > concerns I'm all ears.
> > You didn't really address Peter's initial technical issue. Peter
> > correctly observed that cache pages are just another flavor of free
> > pages. Whenever the VM system is checking the number of free pages
> > against any of the thresholds, it always uses the sum of
> > v_cache_count
> > and v_free_count. So, to anyone familiar with the VM system, like
> > Peter, what you've done, which is to derive a threshold from
> > v_free_target but only compare v_free_count to that threshold, looks
> > highly suspect.
> I think I'd like to see something like this:
> Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
> --- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
> +++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
> @@ -152,7 +152,8 @@
> - return (vm_cnt.v_free_count);
> + /* "cache" is just a flavor of free pages in FreeBSD */
> + return (vm_cnt.v_free_count + vm_cnt.v_cache_count);
This has apparently already been tried and the response from Karl was:
- No, because memory in "cache" is subject to being either reallocated
- When I was developing this patch that was my first impression as well
- I originally coded it, and it turned out to be wrong.
- The issue here is that you have two parts of the system contending for
- the VM system generally, and the ARC cache. If the ARC cache frees
- the VM system activates and starts pruning then you wind up with the
- at the minimum after some period of time, because it releases "early."
I've asked him if he would retest just to be sure.
> The rest of the system looks at the "big picture" it would be happy to
> let the
> "free" pool run quite a way down so long as there's "cache" pages
> available to
> satisfy the free space requirements. This would lead ZFS to
> sacrifice ARC for no reason. I'm not sure how big a deal this is, but
> I can't
> imagine many scenarios where I want ARC to be discarded in order to
> save some
> effectively free pages.
>From Karl's response from the original PR (above) it seems like this
unexpected behaviour due to the two systems being seperate.
> > That said, I can easily believe that your patch works better than
> > the
> > existing code, because it is closer in spirit to my interpretation
> > of
> > what the Solaris code does. Specifically, I believe that the
> > Solaris
> > code starts trimming the ARC before the Solaris page daemon starts
> > writing dirty pages to secondary storage. Now, you've made FreeBSD
> > do
> > the same. However, you've expressed it in a way that looks broken.
> > To wrap up, I think that you can easily write this in a way that
> > simultaneously behaves like Solaris and doesn't look wrong to a VM
> > expert.
> > > Out of interest would it be possible to update machines in the
> > > cluster to
> > > see how their workload reacts to the change?
> > >
> I'd like to see the free vs cache thing resolved first but it's going
> to be
> tricky to get a comparison.
Does Karl's explaination as to why this doesn't work above change your
> For the first few months of the year, things were really troublesome.
> It was
> quite easy to overtax the machines and run them into the ground.
> This is not the case now - things are working pretty well under
> (prior to the commit). Its got to the point that we feel comfortable
> thrashing the machines really hard again. Getting a comparison when
> already works well is going to be tricky.
> We don't have large memory machines that aren't already tuned for
> vfs.zfs.arc_max caps for tmpfs use.
> For context to the wider audience, we do not run -release or -pN in
> freebsd cluster. We mostly run -current, and some -stable. I am
> well aware
> that there is significant discomfort in 10.0-R with zfs but we already
> have the
> fixes for that.
More information about the svn-src-all