Behavior of madvise(MADV_FREE)
Alan Cox
alc at rice.edu
Sat Oct 20 18:44:06 UTC 2012
On 10/15/2012 11:01, Marcel Moolenaar wrote:
> On Oct 12, 2012, at 3:05 PM, Jason Evans<jasone at FreeBSD.org> wrote:
>
>> On Oct 12, 2012, at 1:54 PM, Marcel Moolenaar wrote:
>>> BTW: MADV_DONTNEED in Linux seems to behave like MADV_FREE
>>> in FreeBSD -- at least according to the manpage. Which makes
>>> me wonder how standard madvise(2) is anyway.
>> MADV_DONTNEED on Linux immediately dissociates the physical page from the VM mapping, such that subsequent access results in a zero-filled page being soft-faulted into place.
>>
>> MADV_FREE is *way* nicer than MADV_DONTNEED in the context of malloc. jemalloc has a really discouraging amount of complexity that is directly a result of working around the performance overhead of MADV_DONTNEED.
> I've been letting this thread sink in -- responding to last.
>
> Vendors, like Juniper want reliable VM statistics to prevent
> over-provisioning. While the stats don't need to be exact at
> all times (i.e. instantaneous), having the stats catch up to
> a new steady state is very desirable. In other words: it's
> not that helpful to have lots of memory on the inactive queue
> indefinitely.
I'm sympathetic. Once upon a time, I was often called upon to explain
to network administrators why their idle web cache didn't have oodles of
"free" memory and how this wasn't a problem.
> Also, moving the complexity of exactly which hint to give the
> kernel under different scenarios isn't that appealing at all.
> It just doesn't scale.
I think that you're being a bit too pessimistic here. If your use case
really corresponds to "this memory is free and will not be reused (or
reallocated for a very long time)", then that is qualitatively very
different from the way malloc(3) uses MADV_FREE. malloc(3)'s use of
MADV_FREE is highly speculative. It doesn't really know what the
application is going to do in the future. I don't think that having two
distinct hints that distinguish between "speculative" and
"non-speculative" uses would be problematic. The distinction is real
and also easy to explain. The only danger is that application writers
really don't understand their application and use the wrong hint.
> ... If some VM changes warrant a new hint
> to madvise(), you may end up changing multiple daemons. It
> seems better to have just 1 hint (i.e. MADV_FREE) and have the
> kernel change its behaviour depending on the situation. When
> there's plenty of memory, you may even ignore the hint. Under
> severe memory pressure you may want to free up the page right
> away so that you can give it to some thread that's waiting
> for a page.
How is this really different from the existing behavior? If a thread is
waiting for a page, then the page daemon is running. In particular, it
is moving pages from the head of the inactive queue, where they were
placed by MADV_FREE, to the cache/free queue and waking up the waiting
thread when the aggregate cache/free target is met.
> At the edge of needing to swap, complex algorithms
> may be worthwhile -- or maybe not. I don't know.
>
> This leads to:
> 1. Keep MADV_FREE as it behaves in FreeBSD right now or make
> it even more sloppy.
I'm not sure that I understand what you mean by "sloppy" here. Can you
elaborate?
> 2. Have an idle thread that moves inactive pages to the cache
> or free queue if they've been inactive for X minutes, for
> some tunable X. Have it back off when the pageout daemon
> kicks in.
The existing page daemon already wakes up periodically and looks around
for something to do. In particular, have a look at
vm_pageout_page_stats(). That function tries to do something analogous
to what you propose. In part, it tries to prevent munmap(2)ed
file-backed pages from getting stuck in the active queue.
> 3. Have MADV_FREE behave like Linux's MADV_DONTNEED when the
> machine is under significant/severe/some) memory pressure.
>
> Thoughts?
>
More information about the freebsd-arch
mailing list