Re: Weird change of behavior of VSZ allocation in 14.2 compared to 12.4
Date: Wed, 02 Jul 2025 16:43:29 UTC
On 7/2/2025 11:08, Paul wrote: >> This is definitely what I would expect from the vm map of a process that >> uses a lot of anon memory. Basically it means that both jemalloc and >> kernel strategies to defragment memory map work. I had the intent to >> blame ASLR for your problems, but it is definitely not, since defrag >> worked perfectly. >> >> Typically, malloc implementations do not free (unmap) allocated VA ranges to >> the kernel, at most doing posix_madvise(MADV_FREE) when a range is free from >> the allocator PoV. >> >> There were a lot of changes between stable/12 and stable/14, including >> e.g. imports of a new versions of jemalloc. >> >> You might try to isolate your troubles to either userspace or kernel, >> by running stable/12 userspace with the app on stable/14 kernel. >> If the behavior is back to what you desired, might be try to replace >> malloc on pristine stable/14. We have several malloc implementations >> in ports, that should be replaceble with LD_PRELOAD. >> .... >> As I said, malloc implementations usually do not unmap allocated VA. >> The only exception I am aware of (this does not mean that there are no >> other cases) is when a large allocation is directly forwarded to mmap(), >> then free() is normally a direct munmap(). >> >>> Seems like there was a change in system allocator and this is its new strategy? >> Of course there were changes. >> BTW, jemalloc has advanced statistic facilities. Read the jemalloc(3) >> man page to see how to get something useful from it in your situation. >> > > Hi Konstantin, > > Thanks a lot for such a detailed answer and suggestions regarding custom allocators. > We were suspecting that this is a new reality of VSZ management by the allocator and there is nothing wrong with it. And you have confirmed it. > > So, we're going to come up with another strategy to safeguard against stray memory consumption. > > Regarding defragmentation working, it doesn't seem to be the case. During the previous restart there indeed were several ranges, hinting at some defragmentation logic working. But, since the last restart (continuation of data sent in a previous message) we now observe: > > 44608 0x861f75000 0x861f95000 rw- 3 3 1 0 ---D- sw 0.125 0.0117188 > 44608 0x1e853c200000 0x1e853c3e0000 rw- 440 440 1 0 ----- sw 1.875 1.71875 > 44608 0x1e853c400000 0x1e85403eb000 rw- 13103 13103 1 0 ----- sw 63.918 51.1836 > 44608 0x1e8540400000 0x1e8f08a21000 rw- 4122954 4122954 1 0 --S-- sw 40070.1 16105.3 > 44608 0x1fb104d95000 0x1fb104d96000 r-- 1 3 2 0 ----- sw 0.00390625 0.00390625 > 44608 0x1fb104d96000 0x1fb104d98000 rw- 2 3 2 0 ----- sw 0.0078125 0.0078125 > 44608 0x7ffffffff000 0x800000000000 --- 0 0 0 0 ----- gd 0.00390625 0 > > One huge chunk of almost 40GiB while only 16GiB are actually in use. And it only grows from there. I have seen this sort of behavior in long-lived processes myself; what I suspect is going on (and its substantiated by changes I made that addressed it) is that something being allocated is "blocking" consolidation/defragmentation and, in some cases, this becomes pathological. For example, if I use some of Openssl's EVP hash generator functions (which requires I get a context) and then free the context after use this can happen. I've gotten rather strategic about this sort of use to evade that in things like a fastcgi app that may have a lifetime of "since the machine booted" and "millions of potential transactions" over that time -- for example, if I have a function that does hashes in there for some reason and uses openssl's EVP functions to do so I'll initialize the context on application start and then "reset" it after each use rather than free and get another one next time through. This puts the RAM grab for it near the "front" of what the application allocates and thus avoids the risk. If I don't do this I run the risk that defrag/reuse gets blocked (e.g. there's nothing contiguous of the allocation request available, thus a new block of storage is allocated) and VSS will expand every time that occurs even though there is no actual leak of RAM. It is unlikely to be trouble if the allocations are modest in size but if you need to allocate a large buffer it can become a significant issue. If you don't have control over the allocation or when it occurs as is the case in many languages that abstract that away from you as a programmer..... -- Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/