Re: Graph of the FreeBSD memory fragmentation

From: Ryan Libby <rlibby_at_freebsd.org>
Date: Tue, 14 May 2024 17:18:18 UTC
On Tue, May 14, 2024 at 9:09 AM Ryan Libby <rlibby@freebsd.org> wrote:
>
> On Tue, May 14, 2024 at 1:14 AM Alexander Leidinger
> <Alexander@leidinger.net> wrote:
> >
> > Am 2024-05-14 03:54, schrieb Ryan Libby:
> > > That was a long winded way of saying: the "UMA bucket" axis is
> > > actually "vm phys free list order".
> > >
> > > That said, I find that dimension confusing because in fact there's
> > > just one piece of information there, the average size of a free list
> > > entry, and it doesn't actually depend on the free list order.  The
> > > graph could be 2D.
> >
> > It evolved into that...
> > At first I had a 3 dimensional dataset and the first try was to plot it
> > as is (3D). The outcome (as points) was not as good as I wanted it to
> > be, and plotting as lines gave the wrong direction of lines. I massaged
> > the plotting instructions until it looked good enough. I did not try a
> > 2D plot. I agree, with different colors for each free list order a 2D
> > plot may work too. If a 2D plot is better than a 3D plot in this case,
> > depends on the mental model of the topic the viewer has. One size may
> > not fit all. Feel free to experiment with other plotting styles.
> >
>
> What I mean is that the 13 values in the depth dimension (now "freelist
> size") are actually all showing the same information -- except for
> integer truncation issues and having clamped the negative values at
> -1000.  Each index value for a given order completely determines the
> values for the other orders at a given time point.
>
> In the patch (D40575) this is
>         return (1000 -
>             ((info.free_pages * 1000) / (1 << order) / info.free_blocks));
> but notice that free_pages and free_blocks don't depend on order, they
> are computed across all free list entries, of all orders, and are the
> same for a calculation for any order.  So for example we could solve for
> the average free list entry size by taking the value from order of 0:
>         index_0 = 1000 - 1000 / 1 * free_pages / free_blocks
>         avg_pages = free_pages / free_blocks = -(index_0 - 1000) / 1000
> and from that you can calculate all the other values.  Or just display
> it directly.  I'd suggest try plotting log2(avg_pages).
>
> In other words, I think just considering one value per time point is
> simpler and doesn't lose any information.
>
> > > The paper that defines this fragmentation index also says that "the
> > > fragmentation index is only meaningful when an allocation fails".  Are
> > > you actually seeing any contiguous allocations failures in your
> > > measurements?
> >
> > I'm not aware of such.
> > The index may only be meaningful for the purposes of the goal of the
> > paper when there are such failures, but if you look at the graph and how
> > it changed when Bojan changed the guard pages, I see value in the graph
> > for more than what the paper suggests.
> >
> > > Without that context, it seems like what the proposed sysctl reports
> > > is indirectly just the average size of free list entries.  We could
> > > just report that.
> >
> > The calculation of the value is part of a bigger picture. The value
> > returned is used by some other code to make decisions.
> >
> > Bye,
> > Alexander.
> >
> > --
> > http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> > http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF
>
> Okay I see that D40772 uses it, but always passes order=9, and compares
> it with threshold=300.

I see that it is not "always" as the order is actually arch dependent.

> Rearranging, it asks if the average free list
> entry size is at least 1.4 MiB.

...on amd64.

>
> Personally I'd prefer to consider values that are easy to interpret
> rather than an arbitrary index value.
>
> Ryan