Big physically contiguous mbuf clusters

Thu Jan 30 08:47:08 UTC 2014

on 30/01/2014 00:27 John-Mark Gurney said the following:
> Adrian Chadd wrote this message on Wed, Jan 29, 2014 at 14:21 -0800:
>> On 29 January 2014 10:54, Garrett Wollman <wollman at csail.mit.edu> wrote:
>>> Resolved: that mbuf clusters longer than one page ought not be
>>> supported.  There is too much physical-memory fragmentation for them
>>> to be of use on a moderately active server.  9k mbufs are especially
>>> bad, since in the fragmented case they waste 3k per allocation.
>>
>> I've been wondering whether it'd be feasible to teach the physical
>> memory allocator about >page sized allocations and to create zones of
>> slightly more physically contiguous memory.
>>
>> For servers with lots of memory we could then keep these around and
>> only dip into them for temporary allocations (eg not VM pages that may
>> be held for some unknown amount of time.)
>>
>> Question is - can we enforce that kind of behaviour?
> 
> It shouldn't be too hard to do...  Since everything pretty much goes
> through uma we can adopt a scheme similar to what Solaris does (read
> Magazines and Vmem: Extending the Slab Allocator to Many CPUs and
> Arbitrary Resources)...  Instead of dealing w/ page size allocations,
> everything is larger, say 16KB, and broken down from there...
> 

FWIW, this is not how it is currently implemented in Solaris judging from
OpenSolaris / illumos code.
They try to find a slab size where the waste would be minimal.  There is a cap
on the maximum slab size, of course.  This is also done for sub-page items.
E.g. if an item size is 3KB, then FreeBSD uma would use 4KB slabs and waste
about 1KB in each slab.  On the other hand, illumos kmem cache code would pick
12KB slab size.

-- 
Andriy Gapon