OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
avg at FreeBSD.org
Thu Oct 16 06:10:44 UTC 2014
On 16/10/2014 08:56, Justin T. Gibbs wrote:
> avg pointed out the rate limiting code in vm_pageout_scan() during discussion
> about PR 187594. While it certainly can contribute to the problems discussed
> in that PR, a bigger problem is that it can allow the OOM killer to be
> triggered even though there is plenty of reclaimable memory available in the
> system. Any load that can consume enough pages within the polling interval
> to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero
> of=/file/on/zfs') can make this happen.
> The product I’m working on does not have swap configured and treats any OOM
> trigger as fatal, so it is very obvious when this happens. :-)
> I’ve tried several things to mitigate the problem. The first was to ignore
> rate limiting for pass 2. However, even though ZFS is guaranteed to receive
> some feedback prior to OOM being declared, my testing showed that a trivial
> load (a couple dd operations) could still consume enough of the reclaimed
> space to leave the system below its target at the end of pass 2. After
> removing the rate limiting entirely, I’ve so far been unable to kill the
> system via a ZFS induced load.
> I understand the motivation behind the rate limiting, but the current
> implementation seems too simplistic to be safe. The documentation for the
> Solaris slab allocator provides good motivation for their approach of using a
> “sliding average” to reign in temporary bursts of usage without unduly
> harming efficient service for the recorded steady-state memory demand.
> Regardless of the approach taken, I believe that the OOM killer must be a
> last resort and shouldn’t be called when there are caches that can be
FWIW, I have this toy branch:
Not all commits are relevant to the problem and some things are unfinished.
Not sure if the changes would help your case either...
> One other thing I’ve noticed in my testing with ZFS is that it needs feedback
> and a little time to react to memory pressure. Calling it’s lowmem handler
> just once isn’t enough for it to limit in-flight writes so it can avoid reuse
> of pages that it just freed up. But, it doesn’t take too long to react (>
I've been thinking about this and maybe we need to make arc_memory_throttle()
more aggressive on FreeBSD. I can't say that I really follow the logic of that
> 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that
> we can better record that progress is being made (pages were freed in the
> pass, even if some/all of them were consumed again) and allow more passes
> before the OOM killer is invoked in this case?
More information about the freebsd-current