m_pkthdr.rcvif dangling pointer problem

Luigi Rizzo rizzo at iet.unipi.it
Tue Jul 26 13:13:39 UTC 2011


On Tue, Jul 26, 2011 at 10:09:09AM +0100, Robert N. M. Watson wrote:
> 
> On 25 Jul 2011, at 12:00, Daan Vreeken wrote:
> 
> > Couldn't the dangling pointer problem be solved by adding a 'generation' field 
> > to the mbuf structure?
> > The 'generation' could be a system-wide number that gets incremented whenever 
> > an interface is removed. The mbuf* functions could keep a (per CPU?) 
> > reference count on the number of mbufs allocated/freed during 
> > that 'generation'. After interface removal, the ifnet structure could be 
> > freed when all the reference counters of generations before the current 
> > generation reach zero (whenever that happens).
> 
> I think a hybrid approach makes sense, combining a number of the ideas we've been kicking about:
> 
> (1) Add per-CPU ifnet refcounts that don't imply cache-line misses on each mbuf alloc/free
> (2) Add optional subsystem drain functions so that subsystems that may have unbounded queueing times for mbufs deterministically ensure reference release, perhaps by substituting a common deadif for outstanding dying references.
> 
> The former gives us actual correctness in terms of avoiding races, the latter gives us deterministic freeing by subsystems that potentially queue mbufs forever (i.e., TCP) but no longer require the ifnet reference.

I'd like to suggest that before doing all this work we could try
and see which subsystems have a real need to de-reference the
reference, which fields they use, and how often.

Because maybe just copying into the mbuf a blob of 8-16 bytes with
useful info (a cookie, fib index, some flags, etc) could perhaps cover the
majority of cases (in terms of usage frequency, not locations in the code)
and let us deal with other cases by looking up the cookie in some
data structure.

As an example:
- some functions just use rcvif to tell whether this is an incoming
  packet. No actual dereference;
- others might only care that rcvif equals some other (already refcounted)
  value, so we don't have a race there.

cheers
luigi


More information about the freebsd-net mailing list