mbuf changes

Andre Oppermann andre at freebsd.org
Mon Sep 27 17:22:28 UTC 2010


On 27.09.2010 18:09, Julian Elischer wrote:
> On 9/27/10 6:14 AM, Andre Oppermann wrote:
>> On 27.09.2010 15:18, Luigi Rizzo wrote:
>>> On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote:
>>> ...
>>>>> my idea was to have an extra field in the mbuf to tell how much room
>>>>> should be reserved/used for metadata (such as mtags) after
>>>>> the payload area so you don't need to change the allocator, and
>>>>> possibly can even modify this on an existing mbuf.
>>>>> Almost always mbufs have spare room (e.g. incoming pkts have all
>>>>> data in the cluster and mostly empty mdata; outgoing, except
>>>>> for rare cases, tend to be in a similar situation.
>>>>> So this approach would allow to take an already allocated
>>>>> mbuf and put the mtag in the spare area after the data.
>>>>
>>>> For incoming data this approach could work as usually 2K mbuf clusters
>>>> are used and they have trailing space available, or rather the normal
>>>> mbuf referencing the cluster doesn't have its own data section unused.
>>>>
>>>> When trailing space should be used the M_TAILINGSPACE() needs modifications
>>>> and a full tree audit is required to make sure that all mbuf consumers are
>>>> correctly using it and not some own version that directly assumes certain
>>>> mbuf sizes, etc. A lot of work.
>>>>
>>>> For locally generated mbufs and socket buffers we try to use the mbufs to
>>>> their maximal extent. When the socket buffer data is packetized it normally
>>>> is referenced then we get the normal mbuf with its data portion unused. So
>>>> that could work.
>>>>
>>>> A complication is the m_tag_free() field and function which puts the memory
>>>> deallocation into the hands of the mtag user. That means all mtag consumers
>>>> have to made aware of provided storage w/o having to return the memory
>>>> directly
>>>> to the memory allocator (malloc/UMA).
>>>>
>>>> So the only way I realistically see is to make use of the mbuf's unused
>>>> data portion when it has external storage to it. This should probably
>>>> cover about 98% of all cases. The rest has to malloc() the mtag storage
>>>> as usual.
>>>
>>> so it wouldn't be bad -- i cannot judge the numbers, but definitely
>>> it would work for all incoming traffic, plus all tcp data packets
>>> (as the payload is in the cluster), plus all pure acks (which are small),
>>> plus all UDP above some 200 bytes...
>>
>> Yes, about that.
>>
>>>> I could whip up a prototype for review in the next weeks.
>>>
>>> I seem to remember that jeffr had already something done in Perforce.
>>
>> That's a more general overhaul of the way mbuf's are structured and
>> allocated with UMA. I'm not sure it provides for the mtag issue. Will
>> check though.
>
> I'd like to see if we can go over his stuff and any other suggested changes before 9.0
> and see if we can agree on a change for 9.0
>
> Jeff, we discussed this a year ago.. do you still have your suggested changes?

In other recent communication Jeff indicated to revisit the mbuf/UMA
situation at around end of this year.

-- 
Andre


More information about the freebsd-net mailing list