Re[2]: Weird change of behavior of VSZ allocation in 14.2 compared to 12.4

From: Paul <devgs_at_ukr.net>
Date: Wed, 02 Jul 2025 15:08:53 UTC

> > ...
> 
> This is definitely what I would expect from the vm map of a process that
> uses a lot of anon memory.  Basically it means that both jemalloc and
> kernel strategies to defragment memory map work.  I had the intent to
> blame ASLR for your problems, but it is definitely not, since defrag
> worked perfectly.
> 
> Typically, malloc implementations do not free (unmap) allocated VA ranges to
> the kernel, at most doing posix_madvise(MADV_FREE) when a range is free from
> the allocator PoV.
> 
> There were a lot of changes between stable/12 and stable/14, including
> e.g. imports of a new versions of jemalloc.
> 
> You might try to isolate your troubles to either userspace or kernel,
> by running stable/12 userspace with the app on stable/14 kernel.
> If the behavior is back to what you desired, might be try to replace
> malloc on pristine stable/14.  We have several malloc implementations
> in ports, that should be replaceble with LD_PRELOAD.
> 
> > 
> > ...
> > 
> 
> As I said, malloc implementations usually do not unmap allocated VA.
> The only exception I am aware of (this does not mean that there are no
> other cases) is when a large allocation is directly forwarded to mmap(),
> then free() is normally a direct munmap().
> 
> > Seems like there was a change in system allocator and this is its new strategy?
> 
> Of course there were changes.
> BTW, jemalloc has advanced statistic facilities.  Read the jemalloc(3)
> man page to see how to get something useful from it in your situation.
> 


Hi Konstantin,

Thanks a lot for such a detailed answer and suggestions regarding custom allocators.
We were suspecting that this is a new reality of VSZ management by the allocator and there is nothing wrong with it. And you have confirmed it.

So, we're going to come up with another strategy to safeguard against stray memory consumption.

Regarding defragmentation working, it doesn't seem to be the case. During the previous restart there indeed were several ranges, hinting at some defragmentation logic working. But, since the last restart (continuation of data sent in a previous message) we now observe:

44608        0x861f75000        0x861f95000 rw-    3    3   1   0 ---D- sw  0.125 0.0117188
44608     0x1e853c200000     0x1e853c3e0000 rw-  440  440   1   0 ----- sw  1.875 1.71875
44608     0x1e853c400000     0x1e85403eb000 rw- 13103 13103   1   0 ----- sw  63.918 51.1836
44608     0x1e8540400000     0x1e8f08a21000 rw- 4122954 4122954   1   0 --S-- sw  40070.1 16105.3
44608     0x1fb104d95000     0x1fb104d96000 r--    1    3   2   0 ----- sw  0.00390625 0.00390625
44608     0x1fb104d96000     0x1fb104d98000 rw-    2    3   2   0 ----- sw  0.0078125 0.0078125
44608     0x7ffffffff000     0x800000000000 ---    0    0   0   0 ----- gd  0.00390625 0

One huge chunk of almost 40GiB while only 16GiB are actually in use. And it only grows from there.