Another ZFS ARC memory question

Ian Downes ian at ndwns.net
Fri Feb 24 18:04:00 UTC 2012


On Fri, Feb 24, 2012 at 01:42:14PM +0000, Luke Marsden wrote:
> On Fri, 2012-02-24 at 12:59 +0000, Tom Evans wrote:
> > On Fri, Feb 24, 2012 at 12:44 PM, Luke Marsden
> > <luke-lists at hybrid-logic.co.uk> wrote:
> > > On Fri, 2012-02-24 at 12:21 +0000, Tom Evans wrote:
> > >> On Fri, Feb 24, 2012 at 11:06 AM, Luke Marsden
> > >> <luke-lists at hybrid-logic.co.uk> wrote:
> > >> > Hi all,
> > >> >
> > >> > Just wanted to get your opinion on best practices for ZFS.
> > >> >
> > >> > We're running 8.2-RELEASE v15 in production on 24GB RAM amd64 machines
> > >> > but have been having trouble with short spikes in application memory
> > >> > usage resulting in huge amounts of swapping, bringing the whole machine
> > >> > to its knees and crashing it hard.  I suspect this is because when there
> > >> > is a sudden spike in memory usage the zfs arc reclaim thread is unable
> > >> > to free system memory fast enough.
> > >> >
> > >> > This most recently happened yesterday as you can see from the following
> > >> > munin graphs:
> > >> >
> > >> > E.g. http://hybrid-logic.co.uk/memory-day.png
> > >> >     http://hybrid-logic.co.uk/swap-day.png
> > >> >
> > >> > Our response has been to start limiting the ZFS ARC cache to 4GB on our
> > >> > production machines - trading performance for stability is fine with me
> > >> > (and we have L2ARC on SSD so we still get good levels of caching).
> > >> >
> > >> > My questions are:
> > >> >
> > >> >      * is this a known problem?
> > >> >      * what is the community's advice for production machines running
> > >> >        ZFS on FreeBSD, is manually limiting the ARC cache (to ensure
> > >> >        that there's enough actually free memory to handle a spike in
> > >> >        application memory usage) the best solution to this
> > >> >        spike-in-memory-means-crash problem?
> > >> >      * has FreeBSD 9.0 / ZFS v28 solved this problem?
> > >> >      * rather than setting a hard limit on the ARC cache size, is it
> > >> >        possible to adjust the auto-tuning variables to leave more free
> > >> >        memory for spiky memory situations?  e.g. set the auto-tuning to
> > >> >        make arc eat 80% of memory instead of ~95% like it is at
> > >> >        present?
> > >> >      * could the arc reclaim thread be made to drop ARC pages with
> > >> >        higher priority before the system starts swapping out
> > >> >        application pages?
> > >> >
> > >> > Thank you for any/all answers, and thank you for making FreeBSD
> > >> > awesome :-)
> > >>
> > >> It's not a problem, it's a feature!
> > >>
> > >> By default the ARC will attempt to cache as much as it can - it
> > >> assumes the box is a ZFS filer, and doesn't need RAM for applications.
> > >> The solution, as you've found out, is to limit how much ARC can take
> > >> up.
> > >>
> > >> In practice, you should be doing this anyway. You should know, or have
> > >> an idea, of how much RAM is required for the applications on that box,
> > >> and you need to limit ZFS to not eat into that required RAM.
> > >
> > > Thanks for your reply, Tom!  I agree that the ARC cache is a great
> > > feature, but for a general purpose filesystem it does seem like a
> > > reasonable expectation that filesystem cache will be evicted before
> > > application data is swapped, even if the spike in memory usage is rather
> > > aggressive.  A complete server crash in this scenario is rather
> > > unfortunate.
> > >
> > > My question stands - is this an area which has been improved on in the
> > > ZFS v28 / FreeBSD 9.0 / upcoming FreeBSD 8.3 code, or should it be
> > > standard practice to guess how much memory the applications running on
> > > the server might need and set the arc_max boot.loader tweak
> > > appropriately?  This is reasonably tricky when providing general purpose
> > > web application hosting and so we'll often end up erring on the side of
> > > caution and leaving lots of RAM free "just in case".
> > >
> > > If the latter is indeed the case in the latest stable releases then I
> > > would like to update http://wiki.freebsd.org/ZFSTuningGuide which
> > > currently states:
> > >
> > >        FreeBSD 7.2+ has improved kernel memory allocation strategy and
> > >        no tuning may be necessary on systems with more than 2 GB of
> > >        RAM.
> > >
> > > Thank you!
> > >
> > > Best Regards,
> > > Luke Marsden
> > >
> > 
> > Hmm. That comment is really talking about that you no longer need to
> > tune vm.kmem_size.
> 
> http://wiki.freebsd.org/ZFSTuningGuide
> 
> "No tuning may be necessary" seems to indicate that no changes need to
> be made to boot.loader.  I'm happy to provide a patch for the wiki which
> makes it clearer that for servers which may experience sudden spikes in
> application memory usage (i.e. all servers running user-supplied
> applications), the speed of ARC eviction is insufficient to ensure
> stability and arc_max should be tuned downwards.
> 
> > I get what you are saying about applications suddenly using a lot of
> > RAM should not cause the server to fall over. Do you know why it fell
> > over? IE, was it a panic, a deadlock, etc.
> 
> If you look at the http://hybrid-logic.co.uk/swap-day.png graph you can
> see a huge spike in swap at the point at which the last line of pixels
> at http://hybrid-logic.co.uk/memory-day.png indicates the sudden
> increase in memory usage (by 3GB in active memory usage if you look
> closely).  Since the graph stops at that point it indicates that the
> server became completely unresponsive (e.g. including munin probe
> requests).  I did manage to log in just before it became completely
> unresponsive, but at that point the incoming requests weren't being
> serviced fast enough due to the excessive swapping and the server
> eventually became completely unresponsive (e.g. 'top' output froze and
> never came back).  It continued to respond to pings though and may have
> eventually recovered if I had disabled inbound network traffic.  I don't
> have any evidence of a panic or deadlock, we just hard rebooted the
> machine about 15 minutes later after it failed to recover from the
> swap-storm.
> 
> > FreeBSD does not cope well when you have used up all RAM and swap
> > (well, what does?), and from your graphs it does look like the ARC is
> > not super massive when you had the problem - around 30-40% of RAM?
> 
> The last munin sample indicates roughly 8.5GB ARC out of 24GB, so yes,
> 35%.  I guess what I'd like is for FreeBSD to detect an emergency
> out-of-memory condition and aggressively drop much or all of the ARC
> cache *before* swapping out application memory which causes the system
> to grind to a halt.
> 
> Is this a reasonable request, and is there anything I can do to help
> implement it?
> 
> If not can we update the wiki to make it clearer that ARC limiting is
> necessary, even with high RAM boxes, to ensure stability under spiky
> memory conditions?
> 

Are you sure that it is the ARC data that is causing the issue? I've got
boxes where the ARC *meta* skyrockets and consumes all RAM, greatly
exceeding the arc_meta_limit. E.g. on a very unresponsive local box:

vfs.zfs.arc_meta_limit: 1610612736
vfs.zfs.arc_meta_used: 12183379056

Setting arc_max helps (and seems to be respected), but I don't know why
arc_meta_used exceeds arc_meta_limit.

Ian


More information about the freebsd-fs mailing list