Re: Graph of the FreeBSD memory fragmentation

From: Ryan Libby <rlibby_at_freebsd.org>
Date: Tue, 14 May 2024 16:09:30 UTC
On Tue, May 14, 2024 at 1:14 AM Alexander Leidinger
<Alexander@leidinger.net> wrote:
>
> Am 2024-05-14 03:54, schrieb Ryan Libby:
> > That was a long winded way of saying: the "UMA bucket" axis is
> > actually "vm phys free list order".
> >
> > That said, I find that dimension confusing because in fact there's
> > just one piece of information there, the average size of a free list
> > entry, and it doesn't actually depend on the free list order.  The
> > graph could be 2D.
>
> It evolved into that...
> At first I had a 3 dimensional dataset and the first try was to plot it
> as is (3D). The outcome (as points) was not as good as I wanted it to
> be, and plotting as lines gave the wrong direction of lines. I massaged
> the plotting instructions until it looked good enough. I did not try a
> 2D plot. I agree, with different colors for each free list order a 2D
> plot may work too. If a 2D plot is better than a 3D plot in this case,
> depends on the mental model of the topic the viewer has. One size may
> not fit all. Feel free to experiment with other plotting styles.
>

What I mean is that the 13 values in the depth dimension (now "freelist
size") are actually all showing the same information -- except for
integer truncation issues and having clamped the negative values at
-1000.  Each index value for a given order completely determines the
values for the other orders at a given time point.

In the patch (D40575) this is
        return (1000 -
            ((info.free_pages * 1000) / (1 << order) / info.free_blocks));
but notice that free_pages and free_blocks don't depend on order, they
are computed across all free list entries, of all orders, and are the
same for a calculation for any order.  So for example we could solve for
the average free list entry size by taking the value from order of 0:
        index_0 = 1000 - 1000 / 1 * free_pages / free_blocks
        avg_pages = free_pages / free_blocks = -(index_0 - 1000) / 1000
and from that you can calculate all the other values.  Or just display
it directly.  I'd suggest try plotting log2(avg_pages).

In other words, I think just considering one value per time point is
simpler and doesn't lose any information.

> > The paper that defines this fragmentation index also says that "the
> > fragmentation index is only meaningful when an allocation fails".  Are
> > you actually seeing any contiguous allocations failures in your
> > measurements?
>
> I'm not aware of such.
> The index may only be meaningful for the purposes of the goal of the
> paper when there are such failures, but if you look at the graph and how
> it changed when Bojan changed the guard pages, I see value in the graph
> for more than what the paper suggests.
>
> > Without that context, it seems like what the proposed sysctl reports
> > is indirectly just the average size of free list entries.  We could
> > just report that.
>
> The calculation of the value is part of a bigger picture. The value
> returned is used by some other code to make decisions.
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

Okay I see that D40772 uses it, but always passes order=9, and compares
it with threshold=300.  Rearranging, it asks if the average free list
entry size is at least 1.4 MiB.

Personally I'd prefer to consider values that are easy to interpret
rather than an arbitrary index value.

Ryan