am I abusing the UMA allocator?

Chris Torek torek at torek.net
Mon Jul 15 21:05:44 UTC 2013


I have been experimenting with using the UMA (slab) allocator for
special-purpose physical address ranges.  (The underlying issue is
that we need zone-like and/or mbuf-like data structures to talk to
hardware that has "special needs" in terms of which physical pages
it can in turn use.  Each device has a limited memory window it
can access.)

For my purposes it's nice that the allocation function receives a
"zone" argument, even though the comment in the call says "zone is
passed for legacy reasons".  However, the free function does not
get the zone argument, or anything other than a single bit -- up
to 4 if you cheat harder.  This is ... less convenient (although
in my case I can use the VA being free'd, instead).

What I'm wondering is what this single bit is really for; whether
the allocation and free might be made more flexible for special-
purpose back-end allocators; and whether this is really using
things as intended.

Details:

In the allocator, there's a per-"keg" uk_allocf and uk_freef
("alloc"ation and "free" "f"unction) pointer, and you can set your
own allocation and free functions for any zone with:

    void uma_zone_set_allocf(uma_zone_t zone, uma_alloc allocf);
    void uma_zone_set_freef(uma_zone_t zone, uma_free freef);

(Aside: it seems a bit weird that you set these per *zone*
but they're stored in the *kegs*, specifically the special
"first keg", but never mind... :-) )

Each allocf is called as:

    /* arguments: uma_zone_t zone, int size, uint8_t *pflag, int wait */
    mem = allocf(zone, nbytes, &flags, wait);

where "wait" is made up of malloc flags (M_WAITOK, M_NOWAIT,
M_ZERO, M_USE_RESERVE).  The "flags" argument is not initialized
at this point, so the allocation function must fill it in.  The
filled-in value is stored in the per-slab us_flags and eventually
passed back to each freef function:

    /* arguments: void *mem, int size, uint8_t flag */
    freef(mem, nbytes, pflag); /* where pflag = us->us_flags */

The flags are defined in sys/vm/uma.h and are the UMA_SLAB_* flags
(BOOT, KMEM, KERNEL, "PRIV", OFFP, MALLOC).  UMA_SLAB_PRIV is
described as "private".  The bit is never tested though, so it
seems that a "private" allocator can set UMA_SLAB_PRIV, or not set
it, freely.  It appears to be the only UMA_SLAB_* bit that has no
other defined meaning in uma_core.c or elsewhere.  (Not entirely
true, there's also UMA_SLAB_OFFP which is never tested or set, and
bits 0x40 and 0x80 are unused.  There's also an unused us_pad
right after that.  It looks like OFFP is a leftover, with "on" vs
"off" page slab management controlled through UMA_ZONE_HASH and
also the PG_SLAB bit in the underlying "struct vm_page".)

There's also a per-keg flag spelled UMA_ZFLAG_PRIVALLOC, along
with UMA_ZONE_NOFREE.  But UMA_ZFLAG_PRIVALLOC is never tested;
and UMA_ZONE_NOFREE is really per-keg, and you can't set it from
outside the UMA code.

When the system gets low on memory, it calls uma_reclaim(), which
does (simplified):

    zone_foreach(zone_drain)
    | zone_drain(zone)
      | zone_drain_wait(zone)
        | bucket_cache_drain()
	| zone_foreach_keg()
	  | keg_drain()
	    | test: (UMA_ZONE_NOFREE || keg->uk_freef==NULL)
	    | if either is the case, return now, can't free

The issue here is that draining these special purpose, special-
physical-page-backed zones is not actually going to help the
system any (though freeing internal bucket data structures
could help slightly).  Of course I can have uk_freef == NULL,
but it is nice to keep some statistics, and maybe be able to
trade pages between various special-purpose physical spaces
(by doing my own zone_drain()s on them -- the one in uma_reclaim()
is not going to help the OS much as the physical pages cannot
be handed out to processes, and they "run out" against themselves,
not the VM system).

All in all, I'm now thinking that I'm abusing the slab allocator
too much here.  But I wonder if perhaps some minor changes to
uma_core might make this more useable, or if this is really within
the intent of the UMA code at all.

Chris


More information about the freebsd-hackers mailing list