Re: jemalloc 5.3.0 upgrade

From: Johan Hendriks <joh.hendriks_at_gmail.com>
Date: Thu, 21 Aug 2025 12:11:00 UTC
On 8/15/25 11:56 PM, Warner Losh wrote:
> After much delay, I've landed jemalloc 5.3.0 into main.
>
> This is likely the last update of jemalloc since the upstream is, at 
> best, in turmoil, and at worst dead.
>
> I tried to completely automate all the details of the upgrade, but 
> only got so far. I did the rest of the upgrade by hand (described in 
> FREEBSD-upgrade). I'd held off landing this until I had that, but once 
> it was clear this was likely the last time we'd need this, I just did 
> the last few steps by hand. I did this to make it easier to audit to 
> ensure that the pull request we got for this (which I redid, but 
> compared to the original) didn't sneak something in. Others can audit 
> me as well.
>
> I've run this with a netflix workload and my developer workload with 
> no regressions.
>
> Please let me know if this causes problems for anybody. I'm sure glad 
> I'll not have to rebase the merge again (it was a pathological case 
> for the instructions in the handbook, so I'll update those).
>
> I've been coordinating this with the release engineer for a while now, 
> who gave me a go ahead for landing this during the freeze since I 
> couldn't finish before my vacation last month...
>
> Warner
>
> P.S. Here's the release notes:
> +* 5.3.0 (May 6, 2022)
> +
> +  This release contains many speed and space optimizations, from micro
> +  optimizations on common paths to rework of internal data structures and
> +  locking schemes, and many more too detailed to list below. Multiple 
> percent
> +  of system level metric improvements were measured in tested production
> +  workloads.  The release has gone through large-scale production 
> testing.
> +
> +  New features:
> +  - Add the thread.idle mallctl which hints that the calling thread 
> will be
> +    idle for a nontrivial period of time.  (@davidtgoldblatt)
> +  - Allow small size classes to be the maximum size class to cache in the
> +    thread-specific cache, through the opt.[lg_]tcache_max option. 
>  (@interwq,
> +    @jordalgo)
> +  - Make the behavior of realloc(ptr, 0) configurable with 
> opt.zero_realloc.
> +    (@davidtgoldblatt)
> +  - Add 'make uninstall' support.  (@sangshuduo, @Lapenkov)
> +  - Support C++17 over-aligned allocation.  (@marksantaniello)
> +  - Add the thread.peak mallctl for approximate per-thread peak 
> memory tracking.
> +    (@davidtgoldblatt)
> +  - Add interval-based stats output opt.stats_interval.  (@interwq)
> +  - Add prof.prefix to override filename prefixes for dumps. 
>  (@zhxchen17)
> +  - Add high resolution timestamp support for profiling.  (@tyroguru)
> +  - Add the --collapsed flag to jeprof for flamegraph generation.
> +    (@igorwwwwwwwwwwwwwwwwwwww)
> +  - Add the --debug-syms-by-id option to jeprof for debug symbols 
> discovery.
> +    (@DeannaGelbart)
> +  - Add the opt.prof_leak_error option to exit with error code when 
> leak is
> +    detected using opt.prof_final.  (@yunxuo)
> +  - Add opt.cache_oblivious as an runtime alternative to 
> config.cache_oblivious.
> +    (@interwq)
> +  - Add mallctl interfaces:
> +    + opt.zero_realloc  (@davidtgoldblatt)
> +    + opt.cache_oblivious  (@interwq)
> +    + opt.prof_leak_error  (@yunxuo)
> +    + opt.stats_interval  (@interwq)
> +    + opt.stats_interval_opts  (@interwq)
> +    + opt.tcache_max  (@interwq)
> +    + opt.trust_madvise  (@azat)
> +    + prof.prefix  (@zhxchen17)
> +    + stats.zero_reallocs  (@davidtgoldblatt)
> +    + thread.idle  (@davidtgoldblatt)
> +    + thread.peak.{read,reset}  (@davidtgoldblatt)
> +
> +  Bug fixes:
> +  - Fix the synchronization around explicit tcache creation which 
> could cause
> +    invalid tcache identifiers.  This regression was first released 
> in 5.0.0.
> +    (@yoshinorim, @davidtgoldblatt)
> +  - Fix a profiling biasing issue which could cause incorrect heap 
> usage and
> +    object counts.  This issue existed in all previous releases with 
> the heap
> +    profiling feature.  (@davidtgoldblatt)
> +  - Fix the order of stats counter updating on large realloc which 
> could cause
> +    failed assertions.  This regression was first released in 5.0.0. 
>  (@azat)
> +  - Fix the locking on the arena destroy mallctl, which could cause 
> concurrent
> +    arena creations to fail.  This functionality was first introduced 
> in 5.0.0.
> +    (@interwq)
> +
> +  Portability improvements:
> +  - Remove nothrow from system function declarations on macOS and 
> FreeBSD.
> +    (@davidtgoldblatt, @fredemmott, @leres)
> +  - Improve overcommit and page alignment settings on NetBSD.  (@zoulasc)
> +  - Improve CPU affinity support on BSD platforms.  (@devnexen)
> +  - Improve utrace detection and support.  (@devnexen)
> +  - Improve QEMU support with MADV_DONTNEED zeroed pages detection. 
>  (@azat)
> +  - Add memcntl support on Solaris / illumos.  (@devnexen)
> +  - Improve CPU_SPINWAIT on ARM.  (@AWSjswinney)
> +  - Improve TSD cleanup on FreeBSD.  (@Lapenkov)
> +  - Disable percpu_arena if the CPU count cannot be reliably 
> detected.  (@azat)
> +  - Add malloc_size(3) override support.  (@devnexen)
> +  - Add mmap VM_MAKE_TAG support.  (@devnexen)
> +  - Add support for MADV_[NO]CORE.  (@devnexen)
> +  - Add support for DragonFlyBSD.  (@devnexen)
> +  - Fix the QUANTUM setting on MIPS64.  (@brooksdavis)
> +  - Add the QUANTUM setting for ARC.  (@vineetgarc)
> +  - Add the QUANTUM setting for LoongArch.  (@wangjl-uos)
> +  - Add QNX support.  (@jqian-aurora)
> +  - Avoid atexit(3) calls unless the relevant profiling features are 
> enabled.
> +    (@BusyJay, @laiwei-rice, @interwq)
> +  - Fix unknown option detection when using Clang.  (@Lapenkov)
> +  - Fix symbol conflict with musl libc.  (@georgthegreat)
> +  - Add -Wimplicit-fallthrough checks.  (@nickdesaulniers)
> +  - Add __forceinline support on MSVC.  (@santagada)
> +  - Improve FreeBSD and Windows CI support.  (@Lapenkov)
> +  - Add CI support for PPC64LE architecture.  (@ezeeyahoo)
> +
> +  Incompatible changes:
> +  - Maximum size class allowed in tcache (opt.[lg_]tcache_max) now 
> has an upper
> +    bound of 8MiB.  (@interwq)
> +
> +  Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq):
> +  - Optimize the common cases of the thread cache operations.
> +  - Optimize internal data structures, including RB tree and pairing 
> heap.
> +  - Optimize the internal locking on extent management.
> +  - Extract and refactor the internal page allocator and interface 
> modules.
> +
> +  Documentation:
> +  - Fix doc build with --with-install-suffix.  (@lawmurray, @interwq)
> +  - Add PROFILING_INTERNALS.md.  (@davidtgoldblatt)
> +  - Ensure the proper order of doc building and installation. 
>  (@Mingli-Yu)
>
>
Just a thank you for all your work.

regards,
Johan Hendriks