UMA & mbuf cache utilization.

Wed Dec 17 13:58:51 PST 2008

On Tue, 16 Dec 2008, Paul Saab wrote:

> So far testing has shown in a pure transmit test, that this doesn't hurt
> performance at all.

It would appear that our stack at present is not concurrent enough for 
this optimization to help as it does elsewhere.  Rather than needlessly 
complicate things at this time I'm going to go ahead and commit the uma 
portion so that this work isn't lost and leave the mbuf changes until 
such time that they are advantageous.  This will allow for further 
experimentation as well.

Thanks,
Jeff

>
> On Tue, Dec 9, 2008 at 6:22 PM, Jeff Roberson <jroberson at jroberson.net>wrote:
>
>> Hello,
>>
>> Nokia has graciously allowed me to release a patch which I developed to
>> improve general mbuf and cluster cache behavior.  This is based on others
>> observations that due to simple alignment at 2k and 256k we achieve a poor
>> cache distribution for the header area of packets and the most heavily used
>> mbuf header fields.  In addition, modern machines stripe memory access
>> across several memories and even memory controllers.  Accessing heavily
>> aligned locations such as these can also create load imbalances among
>> memories.
>>
>> To solve this problem I have added two new features to UMA.  The first is
>> the zone flag UMA_ZONE_CACHESPREAD.  This flag modifies the meaning of the
>> alignment field such that start addresses are staggered by at least align +
>> 1 bytes.  In the case of clusters and mbufs this means adding
>> uma_cache_align + 1 bytes to the amount of storage allocated.  This creates
>> a certain constant amount of waste, 3% and 12% respectively.  It also means
>> we must use contiguous physical and virtual memory consisting of several
>> pages to efficiently use the memory and land on as many cache lines as
>> possible.
>>
>> Because contiguous physical memory is not always available, the allocator
>> had to have a fallback mechanism.  We don't simply want to have all mbuf
>> allocations check two zones as once we deplete available contiguous memory
>> the check on the first zone will always fail using the most expensive code
>> path.
>>
>> To resolve this issue, I added the ability for secondary zones to stack on
>> top of multiple primary zones.  Secondary zones are zones which get their
>> storage from another zone but handle their own caching, ctors, dtors, etc.
>> By adding this feature a secondary zone can be created that can allocate
>> either from the contiguous memory pool or the non-contiguous single-page
>> pool depending on availability.  It is also much faster to fail between them
>> deep in the allocator because it is only required when we exhaust the
>> already available mbuf memory.
>>
>> For mbufs and clusters there are now three zones each.  A contigmalloc
>> backed zone, a single-page allocator zone, and a secondary zone with the
>> original zome_mbuf or zone_clust name.  The packet zone also takes from both
>> available mbuf zones.  The individual backend zones are not exposed outside
>> of kern_mbuf.c.
>>
>> Currently, each backend zone can have its own limit.  The secondary zone
>> only blocks when both are full.  Statistic wise the limit should be reported
>> as the sum of the backend limits, however, that isn't presently done.  The
>> secondary zone can not have its own limit independent of the backends at
>> this time.  I'm not sure if that's valuable or not.
>>
>> I have test results from nokia which show a dramatic improvement in several
>> workloads but which I am probably not at liberty to discuss.  I'm in the
>> process of convincing Kip to help me get some benchmark data on our stack.
>>
>> Also as part of the patch I renamed a few functions since many were
>> non-obvious and grew new keg abstractions to tidy things up a bit.  I
>> suspect those of you with UMA experience (robert, bosko) will find the
>> renaming a welcome improvement.
>>
>> The patch is available at:
>> http://people.freebsd.org/~jeff/mbuf_contig.diff
>>
>> I would love to hear any feedback you may have.  I have been developing
>> this and testing various version off and on for months, however, this is a
>> fresh port to current and it is a little green so should be considered
>> experimental.
>>
>> In particular, I'm most nervous about how the vm will respond to new
>> pressure on contig physical pages.  I'm also interested in hearing from
>> embedded/limited memory people about how we might want to limit or tune
>> this.
>>
>> Thanks,
>> Jeff
>> _______________________________________________
>> freebsd-arch at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>>
>>
>