Re: Graph of the FreeBSD memory fragmentation
Date: Tue, 14 May 2024 01:54:08 UTC
On Thu, May 9, 2024 at 2:36 AM Alexander Leidinger <Alexander@leidinger.net> wrote: > > Am 2024-05-08 18:45, schrieb Bojan Novković: > > Hi, > > > > On 5/7/24 14:02, Alexander Leidinger wrote: > > > >> Hi, > >> > >> I created some graphs of the memory fragmentation. > >> https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/ > >> > >> My goal was not comparing a specific change on a given benchmark, but > >> to "have something which visualizes memory fragmentation". As part of > >> that, Bojans commit > >> https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 > >> was just in the middle of my data collection. I have the impression > >> that it made a positive difference in my non deterministic workload. > > Thank you for working on this, the plots look great! > > They provide a really clean visual overview of what's happening. > > I'm working on another type of memory visualization which might > > interest you, I'll share it with you once its done. > > One small nit - the fragmentation index does not quantify fragmentation > > for UMA buckets, but for page allocator freelists. > > Do I get it more correctly now: UMA buckets are type/structure specific > allocation lists, and the page allocator freelists are size-specific > allocation lists (which are used by UMA when no free item is available > in a bucket)? > Yeah, that's more correct. UMA is a higher level allocator than the vm phys free lists. The latter is what the proposed sysctl measures. That measurement is not directly related to UMA or how UMA allocates pages for most zones, except the handful of UMA_ZONE_CONTIG zones. Because otherwise, UMA doesn't explicitly do or require allocations of contiguous pages. Most allocations by UMA of backing pages are done as single pages from the perspective of the vm phys free lists. When possible (slab size of one page and machine architecture support) it then uses direct map addresses to refer to items. Otherwise the backing pages (single or multiple not-necessarily-contiguous) are mapped into kernel virtual address space. Re malloc(9), "small" (up to 64 KB) allocations are served from uma zone "buckets" of various sizes, while larger allocations skip UMA and allocate kmem directly in a similar way to how UMA does allocations for large slabs (as in multiple individual pages are mapped). UMA "buckets" in the sense of struct uma_bucket are caches of free items. For non-cache zones, if buckets are exhausted then UMA goes to slabs. If slabs are exhausted then UMA allocates pages. Pages will first be served from the page cache, before the vm phys free lists. That was a long winded way of saying: the "UMA bucket" axis is actually "vm phys free list order". That said, I find that dimension confusing because in fact there's just one piece of information there, the average size of a free list entry, and it doesn't actually depend on the free list order. The graph could be 2D. The paper that defines this fragmentation index also says that "the fragmentation index is only meaningful when an allocation fails". Are you actually seeing any contiguous allocations failures in your measurements? Without that context, it seems like what the proposed sysctl reports is indirectly just the average size of free list entries. We could just report that. > >> Is there anything which prevents https://reviews.freebsd.org/D40575 to > >> be committed? > > D40575 is closely tied to the compaction patch (D40772) which is > > currently on hold until another issue is solved (see D45046 and related > > revisions for more details). > > Any idea about https://reviews.freebsd.org/D16620 ? Is D45046 supposed > to replace this, or is it about something else? > I wanted to try D16620, but it doesn't apply and my naive/mechanical way > of applying it panics. > > > I didn't consider landing D40575 because of that, but I guess it could > > be useful on its own. > > It at least gives a way to quantify with numbers resp. qualitatively > visualize. And as such it may help in visualizing differences like with > your guard-pages commit. I wonder if the segregation of nofree > allocations may result in a similar improvement for long-running > systems. > > Bye, > Alexander. > > -- > http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF Ryan