Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations?

Sun Oct 2 14:00:28 UTC 2011

2011/10/2 Lev Serebryakov <lev at freebsd.org>:
> Hello, Davide.
> You wrote 2 октября 2011 г., 16:57:48:
>
>>>   But what if I need to allocate a lot (say, 16K-32K) of page-sized
>>> blocks? Not in one chunk, for sure, but in lifetime of my kernel
>>> module. Which allocator should I use? It seems, the best one will be
>>> very low-level only-page-sized allocator. Is here any in kernel?
>
>> My 2cents:
>> Everytime you request a certain amount of memory bigger than 4KB using
>> kernel malloc(), it results in a direct call to uma_large_malloc().
>> Right now, uma_large_malloc() calls kmem_malloc() (i.e. the memory is
>> requested to the VM directly).
>> This kind of approach has two main drawbacks:
>> 1) it heavily fragments the kernel heap
>> 2) when free() is called on these multipage chunks, it in turn calls
>> uma_large_free(), which immediately calls the VM system to unmap and
>> free the chunk of memory.  The unmapping requires a system-wide TLB
>> shootdown, i.e. a global action by every processor in the system.
>
>> I'm currently working supervised by alc@ to an intermediate layer that
>> sits between UMA and the VM, which goal is satisfyinh efficiently
> requests >> 4KB (so, the one you want considering you're asking for
>> 16KB-32KB), but the work is in an early stage.
>  I was not very clear here. I'm saying about page-sized blocks, but
>  many of them. 16K-32K is not a size in bytes, but count of page-sized
>  blocks my code needs :)
>
ok.

>  BTW, I/O is often require big buffers, up to MAXPHYS (128KiB for
>  now), do you mean, that any allocation of such memory has
>  considerable performance penalties, especially on multi-core and
>  multi-CPU systems?
>

In fact, the main client of such kind of allocations is the ZFS
filesystem (this is due to its mechanism of adaptative cache
replacement, ARC). Afaik, at the time in which UMA was written, such
kind of allocations you describe were so infrequent that no initial
effort was made in order to optimize them.
People tried to address this issue by having ZFS create a large number
of UMA zones for large allocations of different sizes. Unfortunately,
one of the side-effects of this approach was the growth of the
fragmentation, so we're investigating about.

> --
> // Black Lion AKA Lev Serebryakov <lev at FreeBSD.org>
>
>