9k jumbo clusters

John-Mark Gurney jmg at funkthat.com
Sun Jul 29 01:11:56 UTC 2018


Adrian Chadd wrote this message on Sat, Jul 28, 2018 at 13:33 -0700:
> On Fri, 27 Jul 2018 at 15:19, John-Mark Gurney <jmg at funkthat.com> wrote:
> 
> > Ryan Moeller wrote this message on Fri, Jul 27, 2018 at 12:45 -0700:
> > > There is a long-standing issue with 9k mbuf jumbo clusters in FreeBSD.
> > > For example:
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183381
> > > https://lists.freebsd.org/pipermail/freebsd-net/2013-March/034890.html
> > >
> > > This comment suggests the 16k pool does not have the fragmentation
> > problem:
> > > https://reviews.freebsd.org/D11560#239462
> > > I???m curious whether that has been confirmed.
> > >
> > > Is anyone working on the pathological case with 9k jumbo clusters in the
> > > physical memory allocator?  There was an interesting discussion started a
> > > few years ago but I???m not sure what ever came of it:
> > > http://docs.freebsd.org/cgi/mid.cgi?21225.20047.947384.390241
> > >
> > > I have seen some work in the direction of avoiding larger than page size
> > > jumbo clusters in 12-CURRENT.  Many existing drivers avoid the 9k cluster
> > > size already.  The code for larger cluster sizes in iflib is #ifdef'd out
> > > so it maxes out at the page size jumbo clusters until
> > "CONTIGMALLOC_WORKS"
> > > (apparently it doesn't).
> > >
> > > With all the changes due to iflib, is there any chance some of this will
> > > get MFC'd to address the serious problem that remains in 11-STABLE?
> > >
> > > Otherwise, would it be feasible to disable the use of the 9k cluster pool
> > > in at least some of the popular NIC drivers as a solution for the stable
> > > branches?
> > >
> > > Finally, I have studied some of the driver code in 11-STABLE and posted
> > the
> > > gist of my notes in relation to this problem.  If anyone spots a mistake
> > or
> > > has something else to contribute, comments on the gist would be greatly
> > > appreciated!
> > > https://gist.github.com/freqlabs/eba9b755f17a223260246becfbb150a1
> >
> > Drivers need to be fixed to use 4k pages instead of cluster.  I really hope
> > no one is using a card that can't do 4k pages, or if they are, then they
> > should get a real card that can do scatter/gather on 4k pages for jumbo
> > frames..
> 
> 
> Yeah but it's 2018 and your server has like minimum a dozen million 4k
> pages.
> 
> So if you're doing stuff like lots of network packet kerchunking why not
> have specialised allocator paths that can do things like "hey, always give
> me 64k physical contig pages for storage/mbufs because you know what?
> they're going to be allocated/freed together always."
> 
> There was always a race between bus bandwidth, memory bandwidth and
> bus/memory latencies. I'm not currently on the disk/packet pushing side of
> things, but the last couple times I were it was at different points in that
> 4d space and almost every single time there was a benefit from having a
> couple of specialised allocators so you didn't have to try and manage a few
> dozen million 4k pages based on your changing workload.
> 
> I enjoy the 4k page size management stuff for my 128MB routers. Your 128G
> server has a lot of 4k pages. It's a bit silly.

We do:
$vmstat -z
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
[...]
8192:                  8192,      0,      67,     109,398203019,   0,   0
16384:                16384,      0,      65,      41,74103020,   0,   0
32768:                32768,      0,      61,      28,41981659,   0,   0
65536:                65536,      0,      17,      23,26127059,   0,   0
[...]
mbuf_jumbo_page:       4096, 509295,       0,      64,183536214,   0,   0
mbuf_jumbo_9k:         9216, 150902,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384,  84882,       0,       0,       0,   0,   0
[...]

And I know you know the problem is that over time memory is fragmented,
so if suddenly you need more jumbo frames than you already have, you're
SOL...

page size allocations will always be available...

Fixing drivers to fall back to 4k allocations (or always use 4k allocations)
is a lot simplier than doing magic work to free pages, etc.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."


More information about the freebsd-net mailing list