Re: Weird change of behavior of VSZ allocation in 14.2 compared to 12.4

From: Karl Denninger <karl_at_denninger.net>
Date: Wed, 02 Jul 2025 16:43:29 UTC
On 7/2/2025 11:08, Paul wrote:
>> This is definitely what I would expect from the vm map of a process that
>> uses a lot of anon memory.  Basically it means that both jemalloc and
>> kernel strategies to defragment memory map work.  I had the intent to
>> blame ASLR for your problems, but it is definitely not, since defrag
>> worked perfectly.
>>
>> Typically, malloc implementations do not free (unmap) allocated VA ranges to
>> the kernel, at most doing posix_madvise(MADV_FREE) when a range is free from
>> the allocator PoV.
>>
>> There were a lot of changes between stable/12 and stable/14, including
>> e.g. imports of a new versions of jemalloc.
>>
>> You might try to isolate your troubles to either userspace or kernel,
>> by running stable/12 userspace with the app on stable/14 kernel.
>> If the behavior is back to what you desired, might be try to replace
>> malloc on pristine stable/14.  We have several malloc implementations
>> in ports, that should be replaceble with LD_PRELOAD.
>> ....
>> As I said, malloc implementations usually do not unmap allocated VA.
>> The only exception I am aware of (this does not mean that there are no
>> other cases) is when a large allocation is directly forwarded to mmap(),
>> then free() is normally a direct munmap().
>>
>>> Seems like there was a change in system allocator and this is its new strategy?
>> Of course there were changes.
>> BTW, jemalloc has advanced statistic facilities.  Read the jemalloc(3)
>> man page to see how to get something useful from it in your situation.
>>
>
> Hi Konstantin,
>
> Thanks a lot for such a detailed answer and suggestions regarding custom allocators.
> We were suspecting that this is a new reality of VSZ management by the allocator and there is nothing wrong with it. And you have confirmed it.
>
> So, we're going to come up with another strategy to safeguard against stray memory consumption.
>
> Regarding defragmentation working, it doesn't seem to be the case. During the previous restart there indeed were several ranges, hinting at some defragmentation logic working. But, since the last restart (continuation of data sent in a previous message) we now observe:
>
> 44608        0x861f75000        0x861f95000 rw-    3    3   1   0 ---D- sw  0.125 0.0117188
> 44608     0x1e853c200000     0x1e853c3e0000 rw-  440  440   1   0 ----- sw  1.875 1.71875
> 44608     0x1e853c400000     0x1e85403eb000 rw- 13103 13103   1   0 ----- sw  63.918 51.1836
> 44608     0x1e8540400000     0x1e8f08a21000 rw- 4122954 4122954   1   0 --S-- sw  40070.1 16105.3
> 44608     0x1fb104d95000     0x1fb104d96000 r--    1    3   2   0 ----- sw  0.00390625 0.00390625
> 44608     0x1fb104d96000     0x1fb104d98000 rw-    2    3   2   0 ----- sw  0.0078125 0.0078125
> 44608     0x7ffffffff000     0x800000000000 ---    0    0   0   0 ----- gd  0.00390625 0
>
> One huge chunk of almost 40GiB while only 16GiB are actually in use. And it only grows from there.

I have seen this sort of behavior in long-lived processes myself; what I 
suspect is going on (and its substantiated by changes I made that 
addressed it) is that something being allocated is "blocking" 
consolidation/defragmentation and, in some cases, this becomes pathological.

For example, if I use some of Openssl's EVP hash generator functions 
(which requires I get a context) and then free the context after use 
this can happen.

I've gotten rather strategic about this sort of use to evade that in 
things like a fastcgi app that may have a lifetime of "since the machine 
booted" and "millions of potential transactions" over that time -- for 
example, if I have a function that does hashes in there for some reason 
and uses openssl's EVP functions to do so I'll initialize the context on 
application start and then "reset" it after each use rather than free 
and get another one next time through.  This puts the RAM grab for it 
near the "front" of what the application allocates and thus avoids the 
risk.  If I don't do this I run the risk that defrag/reuse gets blocked 
(e.g. there's nothing contiguous of the allocation request available, 
thus a new block of storage is allocated) and VSS will expand every time 
that occurs even though there is no actual leak of RAM.  It is unlikely 
to be trouble if the allocations are modest in size but if you need to 
allocate a large buffer it can become a significant issue.

If you don't have control over the allocation or when it occurs as is 
the case in many languages that abstract that away from you as a 
programmer.....

-- 
Karl Denninger
karl@denninger.net
/The Market Ticker/
/[S/MIME encrypted email preferred]/