Deferring inp_freemoptions() to an asychronous task

John Baldwin jhb at freebsd.org
Mon Jan 9 16:30:19 UTC 2012


On Monday, January 09, 2012 10:23:48 am Bruce Simpson wrote:
> John,
> 
> Sorry it's taken me so long to reply.
> 
> No objections in principle to your change, but this seems to point at a
> more general issue with modern network controllers.
> 
> You've also stumbled on the behaviour specific to how BSD has
> traditionally dealt with broadcast/multicast sockets. The pcbinfo
> structure can't really be disentangled from this.
> 
> Of course, it doesn't help that we have historically required these
> sockets to be bound to INADDR_ANY. It might be useful to break reception
> out using a separate hash/tree, rather than walking all sockets as is
> currently done, but legacy usage needs to be supported.
> 
> Interestingly enough, Microsoft has probably done something similar,
> judging from things which appear in MSDN.
> 
> John Baldwin wrote:
> > I have a workload at work where a particular device driver can take a while to 
> > update its MAC filter table when adding or removing multicast link-layer 
> > addresses.  One of the ways I've tackled fixing this is to change 
> > inp_freemoptions() so that it does all of its actual work asychronously in a 
> > separate task.  Currently it does its work synchronously; however, it can be 
> > invoked while the associated protocol holds a write lock on its pcbinfo lock 
> > (e.g. from in_pcbdetach() called from udp_detach()).  This stalls all packet 
> > reception for that protocol since received packets need a read lock on the 
> > pcbinfo to lookup the socket associated with a given (ip, port) tuple.
> 
> There is often a delay between asking for the group and actually getting
> the hash filter entry set up in the MAC, so the operations are async.
> 
> I can see many apps like to assume the operation is instantaneous rather
> than deferred; they are probably being naive...
> 
> The same being true for taking down the hash filter entry is not surprising.

The other fun part in this case is that if it is going to take a long time, a
driver should probably be enabling reception of all multicast (equivalent of
IFF_ALLMULTI) while it reprograms the table to avoid dropping packets for
already-joined groups.  I'm not currently doing this as we are using a different
hack, but I think that is something drivers should probably be doing.

-- 
John Baldwin


More information about the freebsd-net mailing list