madvise() vs posix_fadvise()

Thu Apr 3 15:44:06 UTC 2014

On Thu, 2014-04-03 at 11:02 -0400, John Baldwin wrote:
> On Thursday, April 03, 2014 7:29:03 am Dmitry Sivachenko wrote:
> > 
> > On 27 марта 2014 г., at 19:41, John Baldwin <jhb at FreeBSD.org> wrote:
> > >> 
> > >> I know about mlock(2), it is a bit overkill.
> > >> Can someone please explain the difference between madvise(MADV_WILLNEED) and 
> > > posix_fadvise(POSIX_FADV_WILLNEED)?
> > > 
> > > Right now FADV_WILLNEED is a nop.  (I have some patches to implement it for
> > > UFS.)  I can't recall off the top of my head if MADV_WILLNEED is also a nop.
> > > However, if both are fully implemented they should be similar in terms of
> > > requesting async read-ahead.  MADV_WILLNEED might also conceivably
> > > pre-create PTEs while FADV_WILLNEED can be used on a file that isn't
> > > mapped but is accessed via read(2).
> > > 
> > 
> > 
> > Hello and thanks for your reply.
> > 
> > Right now I am facing the following problem (stable/10):
> > There is a (home-grown) webserver which mmap's a large amount of data files (total size is a bit below of RAM, say ~90GB of files with 128GB of RAM).
> > Server writes access.log (several gigabytes per day).
> > 
> > Some of mmaped data files are used frequently, some are used rarely. On startup, server walks through all of these data files so it's content is read 
> from disk.
> > 
> > After some time of running, I see that rarely used data files are purged from RAM (access to them leads to long-running disk reads) in favour of disk 
> cache
> > (at 0:00, when I rotate and gzip log file I see Inactive memory goes down to the value of log file size).
> > 
> > Is there any way to tell VM system not to push mmap'ed regions out of RAM in favour of disk caches?
> 
> Use POSIX_FADV_NOREUSE with fadvise() for the log files.  They are a perfect
> use case for this flag.  This will tell the VM system to throw the log data
> (move it to cache) after it writes the file.
> 
> -- 
> John Baldwin

Does that work well in the case of something like /var/log/messages that
is repeatedly appended-to at random intervals?  It would be bad if every
new line written to the log triggered a physical read-modify-write.  On
the other hand if it somehow results in the last / partitial block being
the only one likely to stay in memory, that would be perfect.

-- Ian