svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm
Peter Wemm
peter at wemm.org
Fri Aug 29 21:20:49 UTC 2014
On Friday 29 August 2014 21:42:15 Steven Hartland wrote:
> ----- Original Message -----
> From: "Peter Wemm" <peter at wemm.org>
>
> > On Friday 29 August 2014 20:51:03 Steven Hartland wrote:
> snip..
>
> > > Does Karl's explaination as to why this doesn't work above change
> > > your
> > > mind?
> >
> > Actually no, I would expect the code as committed would *cause* the
> > undesirable behavior that Karl described.
> >
> > ie: access a few large files and cause them to reside in cache. Say
> > 50GB or so
> > on a 200G ram machine. We now have the state where:
> >
> > v_cache = 50GB
> > v_free = 1MB
> >
> > The rest of the vm system looks at vm_paging_needed(), which is: do
> > we have
> > enough "v_cache + v_free"? Since there's 50.001GB free, the answer is
> > no.
> > It'll let v_free run right down to v_free_min because of the giant
> > pool of
> > v_cache just sitting there, waiting to be used.
> >
> > The zfs change, as committed will ignore all the free memory in the
> > form of
> > v_cache.. and will be freaking out about how low v_free is getting and
> > will be
> > sacrificing ARC in order to put more memory into the v_free pool.
> >
> > As long as ARC keeps sacrificing itself this way, the free pages in
> > the v_cache
> > pool won't get used. When ARC finally runs out of pages to give up to
> > v_free,
> > the kernel will start using the free pages from v_cache. Eventually
> > it'll run
> > down that v_cache free pool and arc will be in a bare minimum state
> > while this
> > is happening.
> >
> > Meanwhile, ZFS ARC will be crippled. This has consequences - it does
> > RCU like
> > things from ARC to keep fragmentation under control. With ARC
> > crippled,
> > fragmentation will increase because there's less opportunistic
> > gathering of
> > data from ARC.
> >
> > Granted, you have to get things freed from active/inactive to the
> > cache state,
> > but once it's there, depending on the worlkload, it'll mess with ARC.
>
> There's already a vm_paging_needed() check in there below so this will
> already
> be dealt with will it not?
No.
If you read the code that you changed, you won't get that far. The v_free test
comes before vm_paging_needed(), and if the v_free test triggers then ARC will
return pages and not look at the rest of the function.
If this function returns non-zerp, ARC is given back:
static int
arc_reclaim_needed(void)
{
if (kmem_free_count() < zfs_arc_free_target) {
return (1);
}
/*
* Cooperate with pagedaemon when it's time for it to scan
* and reclaim some pages.
*/
if (vm_paging_needed()) {
return (1);
}
ie: if v_free (ignoring v_cache free pages) gets below the threshold, stop
evertyhing and discard ARC pages.
The vm_paging_needed() code is a NO-OP at this point. It can never return
true. Consider:
vm_cnt.v_free_target = 4 * vm_cnt.v_free_min + vm_cnt.v_free_reserved;
vs
vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11;
zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of
v_free_min, and compares it against the smaller v_free pool.
vm_paging_needed() compares the total free pool (v_free + v_cache) against the
smaller wakeup threshold - 110% of v_free_min.
Comparing a larger value against a smaller target than the previous test will
never succeed unless you manually change the arc_free_target sysctl.
Also, what about the magic numbers here:
u_int zfs_arc_free_target = (1 << 19); /* default before pagedaemon init only
*/
That's half a million pages, or 2GB of physical ram on a 4K page size system
How is this going to work on early boot in the machines in the cluster with
less than 2GB of ram?
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20140829/e718b82a/attachment.sig>
More information about the svn-src-head
mailing list