rizzo at iet.unipi.it
Mon Sep 27 13:06:49 UTC 2010
On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote:
> >my idea was to have an extra field in the mbuf to tell how much room
> >should be reserved/used for metadata (such as mtags) after
> >the payload area so you don't need to change the allocator, and
> >possibly can even modify this on an existing mbuf.
> >Almost always mbufs have spare room (e.g. incoming pkts have all
> >data in the cluster and mostly empty mdata; outgoing, except
> >for rare cases, tend to be in a similar situation.
> >So this approach would allow to take an already allocated
> >mbuf and put the mtag in the spare area after the data.
> For incoming data this approach could work as usually 2K mbuf clusters
> are used and they have trailing space available, or rather the normal
> mbuf referencing the cluster doesn't have its own data section unused.
> When trailing space should be used the M_TAILINGSPACE() needs modifications
> and a full tree audit is required to make sure that all mbuf consumers are
> correctly using it and not some own version that directly assumes certain
> mbuf sizes, etc. A lot of work.
> For locally generated mbufs and socket buffers we try to use the mbufs to
> their maximal extent. When the socket buffer data is packetized it normally
> is referenced then we get the normal mbuf with its data portion unused. So
> that could work.
> A complication is the m_tag_free() field and function which puts the memory
> deallocation into the hands of the mtag user. That means all mtag consumers
> have to made aware of provided storage w/o having to return the memory
> to the memory allocator (malloc/UMA).
> So the only way I realistically see is to make use of the mbuf's unused
> data portion when it has external storage to it. This should probably
> cover about 98% of all cases. The rest has to malloc() the mtag storage
> as usual.
so it wouldn't be bad -- i cannot judge the numbers, but definitely
it would work for all incoming traffic, plus all tcp data packets
(as the payload is in the cluster), plus all pure acks (which are small),
plus all UDP above some 200 bytes...
> I could whip up a prototype for review in the next weeks.
I seem to remember that jeffr had already something done in Perforce.
More information about the freebsd-net