Sudden grow of memory in "Laundry" state

Sat Oct 27 05:36:23 UTC 2018

Hi Mark, thanks for you reply.

Regarding memory flags - I'm not setting them directly anywhere.

Initially my app allocates shared memory segment of size RAM \ 2:

namespace bip =boost::interprocess;

bip::managed_shared_memory(bip::open_or_create,SHMEM_SEG_NAME, size)

and never resizes\deallocates it.

Later it uses only "new"-s and "mallocs".

When I watched for flags in DTRACE, they were all like below:

2018 Oct 20 11:24:48 mmap args: addr:0 len:2097152 prot:3 flags:1002 fd:-1 offset:0, which is:

#define	MAP_PRIVATE	0x0002 /* changes are private */
and #define	MAP_ANON	0x1000	/* allocated from memory, swap space */ This is what alloc\new generates by default.

Btw, if you look into mmap\munmap sizes you will notice it generates 3-5 GB more mmaps than munmaps every 5 minutes.
This means machine should be out of memory within an hour. But it doesn't happen so fast...
Why DTRACE lies here? Or is it because DTRACE can't catch all munmaps due to optimizations in kernel code, which was discussed recently in mailing list?

Anyway, from your response I understood that using MADV_FREE may help.
Any idea of how to use it properly? Should I call madvise after every free\delete on the "freed" pages? This sounds like something completely wrong...

On 10/26/18 5:40 PM, Mark Millard wrote:
> On 2018-Oct-26, at 3:07 PM, Robert <robert.ayrapetyan at gmail.com> wrote:
>
>> Sorry, let me be more specific.
>>
>> Please look into: https://docs.google.com/spreadsheets/d/152qBBNokl4mJUc6T6wVTcxaWOskT4KhcvdpOL68gEUM/edit?usp=sharing (wait until charts fully loaded).
> Thanks for giving folks access to the charts originally referred to.
>
>> These are all memory states and mmap\munmap stats collected. Y axis is in MBs, X is a timeline.
> Some things folks looking into this might want to know:
>
> MAP_PRIVATE in use? Vs.: MAP_SHARED in use?
>
> MAP_NOSYNC in use or not?
>
> MAP_ANON in use or not?
>
> MAP_FIXED in use or not? (Probably not?)
>
> But I cover MAP_NOSYNC and another option that is
> in a 3rd call below and do not need such information
> for what I've written.
>
>> It's not a problem to understand which process produces allocations and is being swapped. I know this for sure.
>>
>> The issue is: I strongly believe that by some reason FreeBSD kernel fails to reuse deallocated memory properly.
>>
>> Looking into graphs we can see following:
>>
>> 1. When process allocates memory (mmap), "Active" memory increases, "Free" memory decreases (that's expected).
>>
>> 2. When process deallocates memory (munmap), "Inactive" memory increases, "Active" memory decreases.
>>
>> Memory never returns into "Free" state. That's kind of expected as well.
>  From the description of MAP_NOSYNC for mmap
> (vs. the flushing behavior):
>
> . . . Without this option any VM
> 			pages you dirty	may be flushed to disk every so	often
> 			(every 30-60 seconds usually) which can	create perfor-
> 			mance problems if you do not need that to occur	(such
> 			as when	you are	using shared file-backed mmap regions
> 			for IPC	purposes).  Dirty data will be flushed auto-
> 			matically when all mappings of an object are removed
> 			and all	descriptors referencing	the object are closed.
> 			Note that VM/file system coherency is maintained
> 			whether	you use	MAP_NOSYNC or not.
>
> Note the specified behavior for flushing out "dirty data"
> unless MAP_NOSYNC is in use. (I note another alternative
> later.)
>
> As I understand it FreeBSD uses the swapping/paging code to do the
> flush activity: part of the swap/page space is mapped into the
> the file in question and the flushing is a form of swapping/paging
> out pages.
>
> [Note: Top does not keep track of changes in swap space,
> for example a "swapon" done after top has started
> displaying things will not show an increased swap total
> but the usage can show larger than the shown total.
> Flushing out to a mapped file might be an example of
> this for all I know.]
>
> Also reported for flushing behavior is:
>
> . . . The fsync(2) system call will flush all dirty data and
> 			metadata associated with a file, including dirty
> 			NOSYNC VM data,	to physical media.  The	sync(8)
> 			command and sync(2) system call generally do not
> 			flush dirty NOSYNC VM data.  The msync(2) system
> 			call is
> 			usually	not needed since BSD implements	a coherent
> 			file system buffer cache.  However, it may be used to
> 			associate dirty	VM pages with file system buffers and
> 			thus cause them	to be flushed to physical media	sooner
> 			rather than later.
>
>
> As for munmap: its description is that the address range is still
> reserved afterwards, quoting the description:
>
> The munmap () system call deletes the mappings and guards for the speci-
>       fied address range, and causes further references to addresses within the
>       range to generate invalid memory references.
>
> That last is not equivalent to the address range being "free"
> in that the range still counts against the process address space.
> (So being accurate about what is about RAM availability vs. address
> space usage/availability is important in order to avoid confusions.)
>
> It would appear that to force invalid memory references involves
> keeping page descriptions around but they would be inactive,
> rather than active. This is true no matter if RAM is still associated
> or not. (So this could potentially lead to a form of extra counting
> of RAM use, sort of like in my original note.) See later below for
> another means of control . . .
>
> Remember: "Dirty data will be flushed automatically when all mappings of
> an object are removed and all descriptors referencing the object are
> closed". So without MAP_NOSYNC the flushing is expected. But see below
> for another means of control . . .
>
> There is another call madvise that has an option tied
> to enabling freeing pages and avoiding flushing the
> pages:
>
> MADV_FREE	      Gives the	VM system the freedom to free pages, and tells
> 		      the system that information in the specified page	range
> 		      is no longer important.  This is an efficient way	of
> 		      allowing malloc(3) to free pages anywhere	in the address
> 		      space, while keeping the address space valid.  The next
> 		      time that	the page is referenced,	the page might be
> 		      demand zeroed, or	might contain the data that was	there
> 		      before the MADV_FREE call.  References made to that
> 		      address space range will not make	the VM system page the
> 		      information back in from backing store until the page is
> 		      modified again.
>
> This is a way to let the system free page ranges and
> allow later use of the address range in the process's
> address space. No descriptions of page ranges that should
> generate invalid memory references, so no need of such
> "inactive pages".
>
> MADV_FREE makes clear that your expectations of the meaning
> of munmap does not seem to match FreeBSD's actual usage:
> MADV_FREE must be explicit to get the behavior you appear
> to be looking for. At least that is the way I read the
> documentation's meaning. MAP_NOSYNC does not seem sufficient
> for matching what you are looking for as the behavioral
> properties --but it appears possibly necessary up to when
> MADV_FREE can be used.
>
>> 3. At some point, when sum of "Active" and "Inactive" memory exceeds some upper memory limits,
>>
>> OS starts to push "Inactive" memory into "Laundry" and "Swap". This happens very quick and unexpectedly.
> This is the flushing activity documented as far as I can tell.
>
>> Now why OS doesn't reuse huge amounts of "Inactive" memory when calling mmap?
> Without MADV_FREE use the system does not have "the freedom
> to free pages". Without MAP_NOSYNC as well it is expected
> to flush out some pages at various times as things go along.
>
>> Or my assumption about availability of "Inactive" memory is wrong? Which one is free for allocations then?
> Pages that are inactive and dirty, normally have to be
> flushed out before the RAM for the page can be freed
> for other uses. MADV_FREE is for indicating when this is
> not the case and the usage of the RAM has reach a stage
> where the RAM can be more directly freed (no longer tied
> to the process).
>
> At least that is my understanding.
>
> Mark Johnston had already written about MADV_FREE but not
> with such quoting of related documentation. If he and I
> seem to contradict each other anywhere, believe Mark J.
> I'm no FreeBSD expert. I'm just trying to reference and
> understand the documentation.
>
>> Thanks.
>>
>>
>> On 10/24/18 11:58 AM, Mark Millard wrote:
>>> On 2018-Oct-24, at 1:25 PM, Robert <robert.ayrapetyan at gmail.com> wrote:
>>>
>>>> Sorry, that wasn't my output, mine (related to the screenshot I've sent earlier) is:
>>> No screen shot made it through the list back out to those that
>>> get messages from the freebsd-hackers at freebsd.org reference
>>> in the CC. The list limits itself to text as I understand.
>>>
>>>> Mem: 1701M Active, 20G Inact, 6225M Laundry, 2625M Wired, 280M Free
>>>> ARC: 116M Total, 6907K MFU, 53M MRU, 544K Anon, 711K Header, 55M Other
>>>>       6207K Compressed, 54M Uncompressed, 8.96:1 Ratio
>>>> Swap: 32G Total, 15G Used, 17G Free, 46% Inuse
>>> Relative to my limited point: I do not know the status of
>>> mutually-exclusive categorizations vs. not for ZFS ARC and
>>> Mem.
>>>
>>> Unfortunately, as I understand things, it is questionable if
>>> adding -S to the top command gives you swap information that
>>> can point to what makes up the 15G swapped out by totaling
>>> the sizes listed. But you might at least be able to infer
>>> what processes became swapped out even if you can not get
>>> a good size for the swap space use for each.
>>>
>>> Using -ores does seem like it puts the top users of resident
>>> memory at the top of top's process list.
>>>
>>> Sufficient Active RAM use by processes that stay active will
>>> tend to cause inactive processes to be swapped out. FreeBSD
>>> does not swap out processes that stay active: it pages those
>>> as needed instead of swapping.
>>>
>>> So using top -Sores  might allow watching what active
>>> process(es) grow and stay active and what inactive processes
>>> are swapped out at the time of the activity.
>>>
>>> I do infer that the 15G Used for Swap is tied to processes
>>> that were not active when swapped out.
>>>
>>>> I'm OK with a low "Free" memory if OS can effectively allocate from "Inactive",
>>>>
>>>> but I'm worrying about a sudden move of a huge piece of memory into "Swap" without any relevant mmap calls.
>>>>
>>>>
>>>> My question is: what else (except mmap) may reduce "Free" memory and increase "Laundry"\"Swap" in the system?
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On 10/24/18 9:34 AM, Mark Millard wrote:
>>>>> On 2018-Oct-24, at 11:12 AM, Rozhuk Ivan <rozhuk.im at gmail.com> wrote:
>>>>>
>>>>>> On Wed, 24 Oct 2018 10:19:20 -0700
>>>>>> Robert <robert.ayrapetyan at gmail.com> wrote:
>>>>>>
>>>>>>> So the issue is still happening. Please check attached screenshot.
>>>>>>> The green area is "inactive + cached + free".
>>>>>>>
>>>>>>>   . . .
>>>>>> +1
>>>>>> Mem: 845M Active, 19G Inact, 4322M Laundry, 6996M Wired, 1569M Buf, 617M Free
>>>>>> Swap: 112G Total, 19M Used, 112G Free
>>>>> Just a limited point based on my understanding of "Buf" in
>>>>> top's display . . .
>>>>>
>>>>> If "cached" means "Buf" in top's output, my understanding of Buf
>>>>> is that it is not a distinct memory area. Instead it totals the
>>>>> buffer space that is spread across multiple states: Active,
>>>>> Inactive, Laundry, and possibly Wired(?).
>>>>>
>>>>> In other words: TotalMemory = Active+Inact+Laundry+Wired+Free.
>>>>> If Buf is added to that then there is double counting of
>>>>> everything included in Buf and the total will be larger
>>>>> than the TotalMemory.
>>>>>
>>>>> Also Inact+Buf+Free may double count some of the Inact space,
>>>>> the space that happens to be inactive buffer space.
>>>>>
>>>>> I may be wrong, but that is my understanding.
>>>>>