ZFS ARC and mmap/page cache coherency question
Karl Denninger
karl at denninger.net
Tue Jul 5 17:50:37 UTC 2016
On 7/5/2016 12:19, Matthew Macy wrote:
>
>
> ---- On Mon, 04 Jul 2016 19:26:06 -0700 Karl Denninger <karl at denninger.net> wrote ----
> >
> >
> > On 7/4/2016 18:45, Matthew Macy wrote:
> > >
> > >
> > > ---- On Sun, 03 Jul 2016 08:43:19 -0700 Karl Denninger <karl at denninger.net> wrote ----
> > > >
> > > > On 7/3/2016 02:45, Matthew Macy wrote:
> > > > >
> > > > > Cedric greatly overstates the intractability of resolving it. Nonetheless, since the initial import very little has been done to improve integration, and I don't know of anyone who is up to the task taking an interest in it. Consequently, mmap() performance is likely "doomed" for the foreseeable future.-M----
> > > >
> > > > Wellllll....
> > > >
> > > > I've done a fair bit of work here (see
> > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594) and the
> > > > political issues are at least as bad as the coding ones.
> > > >
> > >
> > >
> > > Strictly speaking, the root of the problem is the ARC. Not ZFS per se. Have you ever tried disabling MFU caching to see how much worse LRU only is? I'm not really convinced the ARC's benefits justify its cost.
> > >
> > > -M
> > >
> >
> > The ARC is very useful when it gets a hit as it avoid an I/O that would
> > otherwise take place.
> >
> > Where it sucks is when the system evicts working set to preserve ARC.
> > That's always wrong in that you're trading a speculative I/O (if the
> > cache is hit later) for a *guaranteed* one (to page out) and maybe *two*
> > (to page back in.)
>
> The question wasn't ARC vs. no-caching. It was LRU only vs LRU + MFU. There are a lot of issues stemming from the fact that ZFS is a transactional object store with a POSIX FS on top. One is that it caches disk blocks as opposed to file blocks. However, if one could resolve that and have the page cache manage these blocks life would be much much better. However, you'd lose MFU. Hence my question.
>
> -M
>
I suspect there's an argument to be made there but the present problems
make determining the impact of that difficult or impossible as those
effects are swamped by the other issues.
I can fairly-easily create workloads on the base code where simply
typing "vi <some file>", making a change and hitting ":w" will result in
a stall of tens of seconds or more while the cache flush that gets
requested is run down. I've resolved a good part (but not all
instances) of this through my work.
My understanding is that 11- has had additional work done to the base
code, but three underlying issues are not, from what I can see in the
commit logs and discussions, addressed: The VM system will page out
working set while leaving ARC alone, UMA reserved-but-not-in-use space
is not policed adequately when memory pressure exists *before* the pager
starts considering evicting working set and the write-back cache is for
many machine configurations grossly inappropriate and cannot be tuned
adequately by hand (particularly being true on a system with vdevs that
have materially-varying performance levels.)
I have more-or-less stopped work on the tree on a forward basis since I
got to a place with 10.2 that (1) works for my production requirements,
resolving the problems and (2) ran into what I deemed to be intractable
political issues within core on progress toward eradicating the root of
the problem.
I will probably revisit the situation with 11- at some point, as I'll
want to roll my production systems forward. However, I don't know when
that will be -- right now 11- is stable enough for some of my embedded
work (e.g. on the Raspberry Pi2) but is not on my server and
client-class machines. Indeed just yesterday I got a lock-order
reversal panic while doing a shutdown after a kernel update on one of my
lab boxes running a just-updated 11- codebase.
--
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2996 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20160705/53efd114/attachment.bin>
More information about the freebsd-hackers
mailing list