UMA caches draining

Wed Jan 8 07:45:35 UTC 2014

On Tue, Jan 07, 2014 at 08:15:36PM +0200, Alexander Motin wrote:
> On 07.01.2014 19:20, Konstantin Belousov wrote:
> > On Tue, Jan 07, 2014 at 11:43:43AM +0200, Alexander Motin wrote:
> >> On 07.01.2014 07:48, Konstantin Belousov wrote:
> >>> On Tue, Jan 07, 2014 at 02:51:09AM +0200, Alexander Motin wrote:
> >>>> I have some questions about memory allocation. At this moment our UMA
> >>>> never returns freed memory back to the system until it is explicitly
> >>>> asked by pageout daemon via uma_reclaim() call, that happens only when
> >>>> system is quite low on memory. How does that coexist with buffer cache
> >>>> and other consumers? Will, for example, buffer cache allocate buffers
> >>>> and be functional when most of system's memory uselessly consumed by UMA
> >>>> caches? Is there some design how that supposed to work?
> >>> Allocation of the pages which consitute a new buffer creates the pressure
> >>> and causes pagedaemon wakeup if amount of free pages is too low.  Look
> >>> at the vm_page_grab() call in allocbuf().  Also note that buffer cache
> >>> is not shrinked in response to the low memory events, and buffers pages
> >>> are excluded from the page daemon scans since pages are wired.
> >>
> >> Thanks. I indeed can't see respective vm_lowmem handler. But how does it
> >> adapt then? It should have some sort of back pressure. And since it
> >> can't know about UMA internals, it should probably just see that system
> >> is getting low on physical memory. Won't it shrink itself first in such
> >> case before pagedaemon start its reclaimage?
> > Buffer cache only caches buffers, it is not supposed to provide the file
> > content cache, at least for VMIO. The buffer cache size is capped during
> > system configuration, the algorithm to calculate the cache size is not
> > easy to re-type, but look at the vfs_bio.c:kern_vfs_bio_buffer_alloc().
> > On the modern machines with 512MB of RAM of more, it is essentially 10%
> > of the RAM which is dedicated to buffer cache.
> 
> So it is hard capped and never returns that memory in any case? 10% is 
> not much, but it still doesn't sound perfect.
No, this is not how buffer cache works. VMIO buffers do not own pages,
the memory is charged to the corresponding vnode vm_object. The reason
why buffer cache page count is capped is due to buffer wiring the
consituing pages.  Before unmapped i/o, pages must be mapped into KVA,
so the buffer cache wiring was naturally limited by the buffer map.

After the unmapped, the buffer map is used only for mapping of metadata
buffers, and the cap on the buffer cache limits amount of wired pages
used by buffers, without the pressure from KVA. Buffers are recycled by
getnewbuf() as usual, while vnode pages are managed by pagedaemon.

So it is incorrect to state that buffers allocate memory for VMIO.

> 
> >> When vm_lowmem condition finally fire, it will purge different data from
> >> different subsystems, potentially still usable. UMA caches though have
> >> no valid data, only an allocation optimization. Shouldn't they be freed
> >> first somehow, at least an unused part, as in my patch? Also I guess
> >> having more really free memory should make M_NOWAIT allocations to fail
> >> less often.
> > IMO this is not a right direction. My opinion is that M_NOWAIT
> > allocation should be mosty banned from the top-level of the kernel, and
> > then interrupt threads and i/o path should try hard to avoid allocations
> > at all.
> 
> OK, M_NOWAIT possibly was bad example, though we have a lot of M_NOWAIT 
> allocations in many important areas of the kernel. But still, making 
> M_WAITOK allocation process to wait in case where system could already 
> be prepared at hopefully low cost is possibly not perfect too.
> 
> > Purging UMA caches on first sign of low memory condition would make UMA
> > slower, possibly much slower for many workloads which are routinely
> > handled now. Our code is accustomed to fast allocators, look at how many
> > allocations typical syscall makes for temp buffers. Such change
> > requires profiling of varying workloads to prove that it does not cause
> > regressions.
> 
> Full purging on low memory is what present implementation actually does. 
> I was proposing much softer alternative, purging only caches unused for 
> last 20 seconds, that in some situations could allow to avoid full 
> purges completely.
The vm_pageout_grow_cache() is called for the kmem_alloc() failures,
which means that normal pageout of paged memory fails to produce enough
fresh free pages. In particular, it must only happen when there is
significant misaliance between user/pageable and kernel/unpageable
allocations. Purging all kernel caches sounds reasonable than, while
purging caches once in 20 seconds is useless if kmem_alloc() requests
are satisfied by pagedaemon.

> 
> > I suspect that what you do is tailored for single (ab)user of UMA. You
> > might try to split UMA low memory handler into two, one for abuser, and
> > one for the rest of caches.
> 
> IMO the only "abuse" of ZFS is that it takes UMA and tries to use it for 
> serious things, size of which is significant in total amount of RAM. And 
> obviously it wants to do it fast too. But general problem of UMA is not 
> new: with increasing number of zones, fluctuating load pattern will make 
> different zones grow in different time, that at some point will 
> inevitably create memory pressure, even if each consumer or even all of 
> them together are size-capped. ZFS just brings that up to the limit, 
> actively using up to 90 different zones. But the problem is not new.
> 
> If you prefer to see UMA consumers divided into some classes -- fine 
> (though IMO that is very non-obvious how to decide that in every case), 
IMO the split is obvious, ZFS would mark it zones for frequent purge,
while other kernel consumers would not.

> but what would you see logic there? Should there be another memory 
> limit, like low and high watermarks? Aren't there any benefits of 
> freeing RAM preventively, when there still "enough" free? Shouldn't we 
> mix "soft" and "hard" purges at some rate between low and high 
> watermarks to keep all consumers feeling fair amount of pressure, 
> depending on their class?
> 
> >>>> I've made an experimental patch for UMA
> >>>> (http://people.freebsd.org/~mav/drain_unused.patch) to make it every 20
> >>>> seconds return back to the system cached memory, unused for the last 20
> >>>> seconds. Algorithm is quite simple and patch seems like working, but I
> >>>> am not sure whether I am approaching problem from the right side. Any
> >>>> thoughts?
> 
> -- 
> Alexander Motin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20140108/22a83b01/attachment.sig>