SMP problem with uma_zalloc

Lara & Harti Brandt brandt at
Sat Jul 19 11:15:00 PDT 2003

Bosko Milekic wrote:

> On Fri, Jul 18, 2003 at 07:05:58PM +0200, Harti Brandt wrote:
>>Hi all,
>>it seems there is a problem with the zone allocator in SMP systems.
>>I have a zone, that has an upper limit on items that resolves to an
>>upper limit of pages of 1. It turns out, that allocations from this
>>zone get stuck from time to time. It seems to me, that the following
>>- on the first call to uma_zalloc a page is allocated and all the free
>>items are put into the cache of the CPU. uz_free of the zone is 0 and
>>uz_cachefree holds all the free items.
>>- when the next call to uma_zalloc occurs on the same CPU, everything is
>>fine. uma_zalloc just gets the next item from the cache.
>>- when the call happens on another CPU, the code finds uz_free to be 0 and
>>checks the page limit (uma_core.c:1492). It finds the limit already
>>reached and puts the process to sleep (uma_zalloc was called with
>>- the process may sleep there forever (depending on circumstances).
>>If M_WAITOK is not set, the code will falsely return NULL while there
>>are still free items (albeight in the cache of another CPU).
>>I wonder whether this is intended behaviour. If yes, this should be
>>definitely documented. uma_zone_set_max() seems to be documented only in
>>the header file and it does not mention, that free items may not actually
>>be allocatable because they happen to sit in another CPU's cache.
>>If it is not intended (I would prefer this), I wonder how one can get the
>>items out of another's CPU cache. I'm not too familiar with this code.
>>I suppose this should be done somewhere around uma_core.c:1485?
>>harti brandt,
>>brandt at, harti at
>   If the per-cpu caches are relatively small (which they ought to be,
>   especially when you've hit a maximum number of allocations from the
>   zone), then this is actually not that bad of a behavior.
>   I spoke to Jeff about this and it seemed to me that he was leaning
>   toward keeping the behavior this way and, in fact, also perhaps _not_
>   even doing an internal free to the zone when UMA_ZFLAG_FULL is in
>   effect but we still have space in the pcpu cache.  While I'm not sure
>   if going that far is a good idea, I _don't_ really think that the
>   current behavior is a bad idea.  As mentionned, when you have a zone
>   that is mostly starved, all future frees will go back to the zone and
>   not the per-cpu caches, but if you have some free items in another
>   per-cpu cache, you're not likely to hit a starvation situation unless
>   something is horribly wrong.  And having the free code actually drain
>   the per-cpu caches in a zone-full situation may lead to bad behavior
>   under heavy load.  Think about what happens under heavy load... your
>   zone is starved and if you then flush all the pcpu caches and the load
>   is still heavy, you're likely to have other threads try to allocate
>   anyway, so they'll end up having to dip into the zone anyway;
>   therefore, there doesn't seem to be much of a reason to push the
>   cached objects back into the zone (if they're going to leave it again
>   soon anyway).

Well the problem is, that nothing is starved. I have an idle machine and
a zone that I have limited to 60 or so items. When allocating the 2nd
item I get block on the zone limit. Usually I get unblocked whenever I
free an item. This will however not happen, because I have neither
reached the limit nor is there memory pressure in the system to which I
could react. I simply may be blocked forever.

That makes the limit feature for zones rather useless, because I cannot 
predict how many of the items I can really allocate (this depends on the 
number of processors, the page size and the configuration of UMA itself).

Perhaps we could make the behaviour dependent on the maximum number of 
items. When it is rather low (a couple of pages worth) and I would block 
on the zone limit and I have free items in another CPU's cache then 
drain one of the caches.

Or I could simply remove the limits.


More information about the freebsd-current mailing list