madvise() vs posix_fadvise()
Dmitry Sivachenko
trtrmitya at gmail.com
Sun Apr 6 08:38:04 UTC 2014
On 06 апр. 2014 г., at 0:11, Dmitry Sivachenko <trtrmitya at gmail.com> wrote:
>
> On 05 апр. 2014 г., at 1:02, Dmitry Sivachenko <trtrmitya at gmail.com> wrote:
>
>> On 05 апр. 2014 г., at 0:12, John Baldwin <jhb at FreeBSD.org> wrote:
>>
>>>
>>> MADV_WILLNEED is not going to give you what you want. OTOH, if you haven't
>>> tried FreeBSD 10 yet, I would suggest trying that. There have been changes
>>> to pagedaemon that might make it do a better job of kicking out the pages
>>> of the log files automatically.
>>>
>>
>>
>> I did. My situation became worse after I moved from stable/9 to stable/10.
>> My feeling is that stable/10 pushes rarely used mmaped pages out of RAM more aggressively than stable/9 did.
>>
>> For now, the only solution I found is doing msync(MS_INVALIDATE) on log files after gzipping and after backup via rsync.
>> This moves corresponding memory pages from Inactive to Free and prevents system to occupy all free memory with cached log files and to purge mmaped data out of RAM to accomodate more disk cache.
>>
>> What I would love to see is an ability to tell OS not to release mmaped data unless "really needed" (disk cache is not an excuse).
>
>
> One more observation as it seems to be related.
> If my program allocates RAM via malloc() rather than mmap(), I see that VM swaps rarely used parts of malloced data out as disk is being used
> (more and more memory goes to Inactive with cached files content).
>
> This is also different from stable/9 and seems not good. Why to keep cached content of files forever? (seems there is no timeout for keeping cached files content in Inactive state). So after few days of uptime all available RAM is either in Active state with frequently used pages of running processes or in Inactive state with cached files data. Rarely used parts of processes memory goes to swap.
>
>
Look at this (top output is sorted by size):
last pid: 2945; load averages: 8.94, 8.88, 9.23 up 25+20:18:46 12:33:26
94 processes: 6 running, 86 sleeping, 2 zombie
CPU: 22.2% user, 0.0% nice, 0.6% system, 0.0% interrupt, 77.2% idle
Mem: 76G Active, 161G Inact, 7485M Wired, 3504M Cache, 1937M Buf, 1906M Free
Swap: 24G Total, 1435M Used, 23G Free, 5% Inuse, 12K In, 196K Out
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2330 mitya 1 27 0 24611M 24626M piperd 12 10:10 10.25% gsort
99508 mitya 1 103 0 15502M 12382M CPU15 15 652:49 100.00% mkcls
79062 mitya 1 52 0 11396M 10721M swread 22 69.2H 87.26% aliw
80062 mitya 1 52 0 11282M 10666M swread 27 67.0H 80.18% aliw
1832 mitya 1 103 0 8940M 8707M CPU28 28 232:09 100.00% aliw
1871 mitya 1 103 0 8326M 8258M CPU11 11 219:13 100.00% aliw
2329 mitya 1 52 0 5335M 5043M getblk 12 109:49 86.57% phraset
2002 mitya 1 52 0 3810M 3232M wswbuf 3 186:33 98.39% phraset
2035 mitya 1 102 0 3810M 3232M CPU16 16 179:33 98.68% phraset
2555 mitya 1 103 0 2416M 2196M CPU20 20 81:34 100.00% aliw
2038 mitya 1 23 0 150M 4808K piperd 29 0:00 0.00% nbest
2005 mitya 1 22 0 150M 4808K piperd 3 0:00 0.00% nbest
1381 root 2 20 0 106M 23684K select 18 0:57 0.00% ruby19
64642 mitya 1 20 0 96608K 1792K select 22 0:37 0.00% sshd
2864 root 1 20 0 92512K 5392K select 6 0:00 0.00% sshd
2866 mitya 1 20 0 92512K 5384K select 18 0:00 0.00% sshd
98119 mitya 1 20 0 92512K 2096K select 23 0:07 0.00% sshd
This machine has 256GB of RAM and all running processes use less than 100GB.
But since now all Free memory moved to Inactive state greedily holding cached files, we see processes are swapping.
This strategy could be beneficial for file servers, but not for other use cases.
More information about the freebsd-hackers
mailing list