Behavior of madvise(MADV_FREE)

Sat Oct 20 18:44:06 UTC 2012

On 10/15/2012 11:01, Marcel Moolenaar wrote:
> On Oct 12, 2012, at 3:05 PM, Jason Evans<jasone at FreeBSD.org>  wrote:
>
>> On Oct 12, 2012, at 1:54 PM, Marcel Moolenaar wrote:
>>> BTW: MADV_DONTNEED in Linux seems to behave like MADV_FREE
>>> in FreeBSD -- at least according to the manpage. Which makes
>>> me wonder how standard madvise(2) is anyway.
>> MADV_DONTNEED on Linux immediately dissociates the physical page from the VM mapping, such that subsequent access results in a zero-filled page being soft-faulted into place.
>>
>> MADV_FREE is *way* nicer than MADV_DONTNEED in the context of malloc.  jemalloc has a really discouraging amount of complexity that is directly a result of working around the performance overhead of MADV_DONTNEED.
> I've been letting this thread sink in -- responding to last.
>
> Vendors, like Juniper want reliable VM statistics to prevent
> over-provisioning. While the stats don't need to be exact at
> all times (i.e. instantaneous), having the stats catch up to
> a new steady state is very desirable. In other words: it's
> not that helpful to have lots of memory on the inactive queue
> indefinitely.

I'm sympathetic.  Once upon a time, I was often called upon to explain 
to network administrators why their idle web cache didn't have oodles of 
"free" memory and how this wasn't a problem.

> Also, moving the complexity of exactly which hint to give the
> kernel under different scenarios isn't that appealing at all.
> It just doesn't scale.

I think that you're being a bit too pessimistic here.  If your use case 
really corresponds to "this memory is free and will not be reused (or 
reallocated for a very long time)", then that is qualitatively very 
different from the way malloc(3) uses MADV_FREE.  malloc(3)'s use of 
MADV_FREE is highly speculative.  It doesn't really know what the 
application is going to do in the future.  I don't think that having two 
distinct hints that distinguish between "speculative" and 
"non-speculative" uses would be problematic.  The distinction is real 
and also easy to explain.  The only danger is that application writers 
really don't understand their application and use the wrong hint.

> ... If some VM changes warrant a new hint
> to madvise(), you may end up changing multiple daemons. It
> seems better to have just 1 hint (i.e. MADV_FREE) and have the
> kernel change its behaviour depending on the situation. When
> there's plenty of memory, you may even ignore the hint. Under
> severe memory pressure you may want to free up the page right
> away so that you can give it to some thread that's waiting
> for a page.

How is this really different from the existing behavior?  If a thread is 
waiting for a page, then the page daemon is running.  In particular, it 
is moving pages from the head of the inactive queue, where they were 
placed by MADV_FREE, to the cache/free queue and waking up the waiting 
thread when the aggregate cache/free target is met.

>   At the edge of needing to swap, complex algorithms
> may be worthwhile -- or maybe not. I don't know.
>
> This leads to:
> 1.  Keep MADV_FREE as it behaves in FreeBSD right now or make
>      it even more sloppy.

I'm not sure that I understand what you mean by "sloppy" here.  Can you 
elaborate?

> 2.  Have an idle thread that moves inactive pages to the cache
>      or free queue if they've been inactive for X minutes, for
>      some tunable X. Have it back off when the pageout daemon
>      kicks in.

The existing page daemon already wakes up periodically and looks around 
for something to do.  In particular, have a look at 
vm_pageout_page_stats().  That function tries to do something analogous 
to what you propose.  In part, it tries to prevent munmap(2)ed 
file-backed pages from getting stuck in the active queue.

> 3.  Have MADV_FREE behave like Linux's MADV_DONTNEED when the
>      machine is under significant/severe/some) memory pressure.
>
> Thoughts?
>