Re: Graph of the FreeBSD memory fragmentation

From: Ryan Libby <rlibby_at_freebsd.org>
Date: Tue, 14 May 2024 01:54:08 UTC
On Thu, May 9, 2024 at 2:36 AM Alexander Leidinger
<Alexander@leidinger.net> wrote:
>
> Am 2024-05-08 18:45, schrieb Bojan Novković:
> > Hi,
> >
> > On 5/7/24 14:02, Alexander Leidinger wrote:
> >
> >> Hi,
> >>
> >> I created some graphs of the memory fragmentation.
> >> https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/
> >>
> >> My goal was not comparing a specific change on a given benchmark, but
> >> to "have something which visualizes memory fragmentation". As part of
> >> that, Bojans commit
> >> https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737
> >> was just in the middle of my data collection. I have the impression
> >> that it made a positive difference in my non deterministic workload.
> > Thank you for working on this, the plots look great!
> > They provide a really clean visual overview of what's happening.
> > I'm working on another type of memory visualization which might
> > interest you, I'll share it with you once its done.
> > One small nit - the fragmentation index does not quantify fragmentation
> > for UMA buckets, but for page allocator freelists.
>
> Do I get it more correctly now: UMA buckets are type/structure specific
> allocation lists, and the page allocator freelists are size-specific
> allocation lists (which are used by UMA when no free item is available
> in a bucket)?
>

Yeah, that's more correct.  UMA is a higher level allocator than the
vm phys free lists.  The latter is what the proposed sysctl measures.

That measurement is not directly related to UMA or how UMA allocates
pages for most zones, except the handful of UMA_ZONE_CONTIG zones.
Because otherwise, UMA doesn't explicitly do or require allocations of
contiguous pages.

Most allocations by UMA of backing pages are done as single pages from
the perspective of the vm phys free lists.  When possible (slab size
of one page and machine architecture support) it then uses direct map
addresses to refer to items.  Otherwise the backing pages (single or
multiple not-necessarily-contiguous) are mapped into kernel virtual
address space.

Re malloc(9), "small" (up to 64 KB) allocations are served from uma
zone "buckets" of various sizes, while larger allocations skip UMA and
allocate kmem directly in a similar way to how UMA does allocations
for large slabs (as in multiple individual pages are mapped).

UMA "buckets" in the sense of struct uma_bucket are caches of free
items.  For non-cache zones, if buckets are exhausted then UMA goes to
slabs.  If slabs are exhausted then UMA allocates pages.  Pages will
first be served from the page cache, before the vm phys free lists.

That was a long winded way of saying: the "UMA bucket" axis is
actually "vm phys free list order".

That said, I find that dimension confusing because in fact there's
just one piece of information there, the average size of a free list
entry, and it doesn't actually depend on the free list order.  The
graph could be 2D.

The paper that defines this fragmentation index also says that "the
fragmentation index is only meaningful when an allocation fails".  Are
you actually seeing any contiguous allocations failures in your
measurements?

Without that context, it seems like what the proposed sysctl reports
is indirectly just the average size of free list entries.  We could
just report that.

> >> Is there anything which prevents https://reviews.freebsd.org/D40575 to
> >> be committed?
> > D40575 is closely tied to the compaction patch (D40772) which is
> > currently on hold until another issue is solved (see D45046 and related
> > revisions for more details).
>
> Any idea about https://reviews.freebsd.org/D16620 ? Is D45046 supposed
> to replace this, or is it about something else?
> I wanted to try D16620, but it doesn't apply and my naive/mechanical way
> of applying it panics.
>
> > I didn't consider landing D40575 because of that, but I guess it could
> > be useful on its own.
>
> It at least gives a way to quantify with numbers resp. qualitatively
> visualize. And as such it may help in visualizing differences like with
> your guard-pages commit. I wonder if the segregation of nofree
> allocations may result in a similar improvement for long-running
> systems.
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

Ryan