Re[2]: Weird change of behavior of VSZ allocation in 14.2 compared to 12.4

From: Paul <devgs_at_ukr.net>
Date: Thu, 03 Jul 2025 08:32:00 UTC

> > 
> > 
> > Hi Konstantin,
> > 
> > Thanks a lot for such a detailed answer and suggestions regarding custom allocators.
> > We were suspecting that this is a new reality of VSZ management by the allocator and there is nothing wrong with it. And you have confirmed it.
> > 
> > So, we're going to come up with another strategy to safeguard against stray memory consumption.
> > 
> > Regarding defragmentation working, it doesn't seem to be the case. During the previous restart there indeed were several ranges, hinting at some defragmentation logic working. But, since the last restart (continuation of data sent in a previous message) we now observe:
> > 
> > 44608        0x861f75000        0x861f95000 rw-    3    3   1   0 ---D- sw  0.125 0.0117188
> > 44608     0x1e853c200000     0x1e853c3e0000 rw-  440  440   1   0 ----- sw  1.875 1.71875
> > 44608     0x1e853c400000     0x1e85403eb000 rw- 13103 13103   1   0 ----- sw  63.918 51.1836
> > 44608     0x1e8540400000     0x1e8f08a21000 rw- 4122954 4122954   1   0 --S-- sw  40070.1 16105.3
> > 44608     0x1fb104d95000     0x1fb104d96000 r--    1    3   2   0 ----- sw  0.00390625 0.00390625
> > 44608     0x1fb104d96000     0x1fb104d98000 rw-    2    3   2   0 ----- sw  0.0078125 0.0078125
> > 44608     0x7ffffffff000     0x800000000000 ---    0    0   0   0 ----- gd  0.00390625 0
> > 
> > One huge chunk of almost 40GiB while only 16GiB are actually in use. And it only grows from there.
> > 
> I have seen this sort of behavior in long-lived processes myself; what I suspect is going on (and its substantiated by changes I made that addressed it) is that something being allocated is "blocking" consolidation/defragmentation and, in some cases, this becomes pathological.
> For example, if I use some of Openssl's EVP hash generator functions (which requires I get a context) and then free the context after use this can happen.
> I've gotten rather strategic about this sort of use to evade that in things like a fastcgi app that may have a lifetime of "since the machine booted" and "millions of potential transactions" over that time -- for example, if I have a function that does hashes in there for some reason and uses openssl's EVP functions to do so I'll initialize the context on application start and then "reset" it after each use rather than free and get another one next time through.  This puts the RAM grab for it near the "front" of what the application allocates and thus avoids the risk.  If I don't do this I run the risk that defrag/reuse gets blocked (e.g. there's nothing contiguous of the allocation request available, thus a new block of storage is allocated) and VSS will expand every time that occurs even though there is no actual leak of RAM.  It is unlikely to be trouble if the allocations are modest in size but if you need to allocate a large buffer it can become a significant issue.
> If you don't have control over the allocation or when it occurs as is the case in many languages that abstract that away from you as a programmer.....
> -- 
> Karl Denninger
> karl@denninger.net
> The Market Ticker
> [S/MIME encrypted email preferred]
> 


Hi Karl,

Thanks for your input. This is indeed a process that lives as long as possible, which every server app strives for. And it does all sorts of allocations, external libraries included. Though it doesn't leak. Two servers loaded roughly the same level exhibit roughly the same amount of RES consumption. While on 14.2 the VSZ just grows and grows, basically forever, on 12.4 it stays very close to RES, basically the same + a small margin. There it runs for months without issues.

The funny thing is that there seem to be some connection to how long the memory stays 'occupied' (time in between malloc() and free()). So, basically, the bulk of the memory is some cache that is evicted by LRU as well as amount of time the record stays unused. Normally the 'unused time' threshold is large and we just evict by LRU (via count limit). And this is the result that we observe: a creeping growth of VSZ. However, when 'unused time' threshold is set to a smaller value like 5min the efficiency of cache, of course, drops and the memory consumption (RES) drops by 25% (less records in cache at any time). The VSZ seems to stop growing indefinitely: it settles at some ratio like x2 of RES and stays like that for many days.

Maybe it was just a lucky coincidence. We're going to repeat this experiment again and share the results.