madvise() vs posix_fadvise()

Dmitry Sivachenko trtrmitya at gmail.com
Wed Apr 9 11:04:28 UTC 2014


On 03 апр. 2014 г., at 23:27, John Baldwin <jhb at FreeBSD.org> wrote:

> On Thursday, April 03, 2014 3:10:49 pm Dmitry Sivachenko wrote:
>> 
>> On 03 апр. 2014 г., at 20:30, John Baldwin <jhb at FreeBSD.org> wrote:
>> 
>>> 
>>> The latter.  It's sort of like a lazy O_DIRECT.  Each time you call write(2),
>>> it tries to move any clean pages from your current sequentially written
>>> stream from inactive to cache, so the pages won't move until a subsequent
>>> write(2) after bufdaemon or the syncer actually forces them to be written.
>>> Unfortunately, it is currently implemented by doing an internal
>>> FADV_DONTNEED after each read() or write().  It would be better if it was
>>> implemented as a callback when buffers are completed.
>> 
>> 
>> 
>> Sounds like FADV_NOREUSE should be befeficial for any log-writing program?
>> (syslogd, apache, nginx, .....)
> 
> Well, it depends.  If you plan on reading the log files, then using NOREUSE
> can potentially make that more expensive as the logs are more likely to be
> out of RAM when you go to read them (even if you have free memory, mostly
> because "cache" isn't perfect, at least in my experience).  OTOH, pagedaemon
> (a part of the VM system) should generally pick the log pages to evict when
> needed (and I believe it might do a better job of that in 10 than it did
> previously).  I think if you know that the log files are kicking more useful
> things out of RAM and you don't generally plan on reading them (note that
> things like compressing them with gzip counts as reading), then FADV_NOREUSE
> can work fine.



Well, just for reference:
I do posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE) after every log open() and I see no difference:
the only disk load during the day is log file writing and I see Inactive memory increase steadily all day long.

This process of converting Free into Inactive converges when there are no Free left and the only way to free it is either remove old log files from disk or do msync(MS_INVALIDATE) on these files.

Some of rarely used mmaped data is pushed out of RAM so referencing this data results in long-running disk reads.

[should be rather expected actually, since rev.254304 was done for EMS/Isilon storage division: now FreeBSD acts like perfect file caching server :) ]


More information about the freebsd-hackers mailing list