madvise() vs posix_fadvise()

Dmitry Sivachenko trtrmitya at gmail.com
Sun Apr 6 08:38:04 UTC 2014


On 06 апр. 2014 г., at 0:11, Dmitry Sivachenko <trtrmitya at gmail.com> wrote:

> 
> On 05 апр. 2014 г., at 1:02, Dmitry Sivachenko <trtrmitya at gmail.com> wrote:
> 
>> On 05 апр. 2014 г., at 0:12, John Baldwin <jhb at FreeBSD.org> wrote:
>> 
>>> 
>>> MADV_WILLNEED is not going to give you what you want.  OTOH, if you haven't
>>> tried FreeBSD 10 yet, I would suggest trying that.  There have been changes
>>> to pagedaemon that might make it do a better job of kicking out the pages
>>> of the log files automatically.
>>> 
>> 
>> 
>> I did. My situation became worse after I moved from stable/9 to stable/10.
>> My feeling is that stable/10 pushes rarely used mmaped pages out of RAM more aggressively than stable/9 did.
>> 
>> For now, the only solution I found is doing msync(MS_INVALIDATE) on log files after gzipping and after backup via rsync.
>> This moves corresponding memory pages from Inactive to Free and prevents system to occupy all free memory with cached log files and to purge mmaped data out of RAM to accomodate more disk cache.
>> 
>> What I would love to see is an ability to tell OS not to release mmaped data unless "really needed" (disk cache is not an excuse).
> 
> 
> One more observation as it seems to be related.
> If my program allocates RAM via malloc() rather than mmap(), I see that VM swaps rarely used parts of malloced data out as disk is being used
> (more and more memory goes to Inactive with cached files content).
> 
> This is also different from stable/9 and seems not good.  Why to keep cached content of files forever? (seems there is no timeout for keeping cached files content in Inactive state).  So after few days of uptime all available RAM is either in Active state with frequently used pages of running processes or in Inactive state with cached files data.  Rarely used parts of processes memory goes to swap.
> 
> 


Look at this (top output is sorted by size):

last pid:  2945;  load averages:  8.94,  8.88,  9.23   up 25+20:18:46  12:33:26
94 processes:  6 running, 86 sleeping, 2 zombie
CPU: 22.2% user,  0.0% nice,  0.6% system,  0.0% interrupt, 77.2% idle
Mem: 76G Active, 161G Inact, 7485M Wired, 3504M Cache, 1937M Buf, 1906M Free
Swap: 24G Total, 1435M Used, 23G Free, 5% Inuse, 12K In, 196K Out

  PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 2330 mitya           1  27    0 24611M 24626M piperd 12  10:10  10.25% gsort
99508 mitya           1 103    0 15502M 12382M CPU15  15 652:49 100.00% mkcls
79062 mitya           1  52    0 11396M 10721M swread 22  69.2H  87.26% aliw
80062 mitya           1  52    0 11282M 10666M swread 27  67.0H  80.18% aliw
 1832 mitya           1 103    0  8940M  8707M CPU28  28 232:09 100.00% aliw
 1871 mitya           1 103    0  8326M  8258M CPU11  11 219:13 100.00% aliw
 2329 mitya           1  52    0  5335M  5043M getblk 12 109:49  86.57% phraset
 2002 mitya       1  52    0  3810M  3232M wswbuf  3 186:33  98.39% phraset
 2035 mitya       1 102    0  3810M  3232M CPU16  16 179:33  98.68% phraset
 2555 mitya           1 103    0  2416M  2196M CPU20  20  81:34 100.00% aliw
 2038 mitya       1  23    0   150M  4808K piperd 29   0:00   0.00% nbest
 2005 mitya       1  22    0   150M  4808K piperd  3   0:00   0.00% nbest
 1381 root            2  20    0   106M 23684K select 18   0:57   0.00% ruby19
64642 mitya           1  20    0 96608K  1792K select 22   0:37   0.00% sshd
 2864 root            1  20    0 92512K  5392K select  6   0:00   0.00% sshd
 2866 mitya           1  20    0 92512K  5384K select 18   0:00   0.00% sshd
98119 mitya           1  20    0 92512K  2096K select 23   0:07   0.00% sshd


This machine has 256GB of RAM and all running processes use less than 100GB.
But since now all Free memory moved to Inactive state greedily holding cached files, we see processes are swapping.

This strategy could be beneficial for file servers, but not for other use cases.


More information about the freebsd-hackers mailing list