mbuf changes

Andre Oppermann andre at freebsd.org
Mon Sep 27 13:14:33 UTC 2010


On 27.09.2010 15:18, Luigi Rizzo wrote:
> On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote:
> ...
>>> my idea was to have an extra field in the mbuf to tell how much room
>>> should be reserved/used for metadata (such as mtags) after
>>> the payload area so you don't need to change the allocator, and
>>> possibly can even modify this on an existing mbuf.
>>> Almost always mbufs have spare room (e.g. incoming pkts have all
>>> data in the cluster and mostly empty mdata; outgoing, except
>>> for rare cases, tend to be in a similar situation.
>>> So this approach would allow to take an already allocated
>>> mbuf and put the mtag in the spare area after the data.
>>
>> For incoming data this approach could work as usually 2K mbuf clusters
>> are used and they have trailing space available, or rather the normal
>> mbuf referencing the cluster doesn't have its own data section unused.
>>
>> When trailing space should be used the M_TAILINGSPACE() needs modifications
>> and a full tree audit is required to make sure that all mbuf consumers are
>> correctly using it and not some own version that directly assumes certain
>> mbuf sizes, etc.  A lot of work.
>>
>> For locally generated mbufs and socket buffers we try to use the mbufs to
>> their maximal extent.  When the socket buffer data is packetized it normally
>> is referenced then we get the normal mbuf with its data portion unused.  So
>> that could work.
>>
>> A complication is the m_tag_free() field and function which puts the memory
>> deallocation into the hands of the mtag user.  That means all mtag consumers
>> have to made aware of provided storage w/o having to return the memory
>> directly
>> to the memory allocator (malloc/UMA).
>>
>> So the only way I realistically see is to make use of the mbuf's unused
>> data portion when it has external storage to it.  This should probably
>> cover about 98% of all cases.  The rest has to malloc() the mtag storage
>> as usual.
>
> so it wouldn't be bad -- i cannot judge the numbers, but definitely
> it would work for all incoming traffic, plus all tcp data packets
> (as the payload is in the cluster), plus all pure acks (which are small),
> plus all UDP above some 200 bytes...

Yes, about that.

>> I could whip up a prototype for review in the next weeks.
>
> I seem to remember that jeffr had already something done in Perforce.

That's a more general overhaul of the way mbuf's are structured and
allocated with UMA.  I'm not sure it provides for the mtag issue.  Will
check though.

-- 
Andre


More information about the freebsd-net mailing list