UMA & mbuf cache utilization.
Jeff Roberson
jroberson at jroberson.net
Tue Dec 9 18:24:34 PST 2008
Hello,
Nokia has graciously allowed me to release a patch which I developed to
improve general mbuf and cluster cache behavior. This is based on others
observations that due to simple alignment at 2k and 256k we achieve a poor
cache distribution for the header area of packets and the most heavily
used mbuf header fields. In addition, modern machines stripe memory
access across several memories and even memory controllers. Accessing
heavily aligned locations such as these can also create load imbalances
among memories.
To solve this problem I have added two new features to UMA. The first is
the zone flag UMA_ZONE_CACHESPREAD. This flag modifies the meaning of the
alignment field such that start addresses are staggered by at least align
+ 1 bytes. In the case of clusters and mbufs this means adding
uma_cache_align + 1 bytes to the amount of storage allocated. This
creates a certain constant amount of waste, 3% and 12% respectively. It
also means we must use contiguous physical and virtual memory consisting
of several pages to efficiently use the memory and land on as many cache
lines as possible.
Because contiguous physical memory is not always available, the allocator
had to have a fallback mechanism. We don't simply want to have all mbuf
allocations check two zones as once we deplete available contiguous memory
the check on the first zone will always fail using the most expensive code
path.
To resolve this issue, I added the ability for secondary zones to stack on
top of multiple primary zones. Secondary zones are zones which get their
storage from another zone but handle their own caching, ctors, dtors, etc.
By adding this feature a secondary zone can be created that can allocate
either from the contiguous memory pool or the non-contiguous single-page
pool depending on availability. It is also much faster to fail between
them deep in the allocator because it is only required when we exhaust the
already available mbuf memory.
For mbufs and clusters there are now three zones each. A contigmalloc
backed zone, a single-page allocator zone, and a secondary zone with the
original zome_mbuf or zone_clust name. The packet zone also takes from
both available mbuf zones. The individual backend zones are not exposed
outside of kern_mbuf.c.
Currently, each backend zone can have its own limit. The secondary zone
only blocks when both are full. Statistic wise the limit should be
reported as the sum of the backend limits, however, that isn't presently
done. The secondary zone can not have its own limit independent of the
backends at this time. I'm not sure if that's valuable or not.
I have test results from nokia which show a dramatic improvement in
several workloads but which I am probably not at liberty to discuss. I'm
in the process of convincing Kip to help me get some benchmark data on our
stack.
Also as part of the patch I renamed a few functions since many were
non-obvious and grew new keg abstractions to tidy things up a bit. I
suspect those of you with UMA experience (robert, bosko) will find the
renaming a welcome improvement.
The patch is available at:
http://people.freebsd.org/~jeff/mbuf_contig.diff
I would love to hear any feedback you may have. I have been developing
this and testing various version off and on for months, however, this is a
fresh port to current and it is a little green so should be considered
experimental.
In particular, I'm most nervous about how the vm will respond to new
pressure on contig physical pages. I'm also interested in hearing from
embedded/limited memory people about how we might want to limit or tune
this.
Thanks,
Jeff
More information about the freebsd-arch
mailing list