route/arp lifetime (Re: it's the output, not ack coalescing (Re: TSO and FreeBSD vs Linux))

Wed Aug 14 15:40:18 UTC 2013

On Wednesday 14 August 2013 14:40:24 Luigi Rizzo wrote:
> On Wed, Aug 14, 2013 at 04:15:25PM +0400, Alexander V. Chernikov wrote:
> > On 14.08.2013 16:05, Luigi Rizzo wrote:
> > > On Wed, Aug 14, 2013 at 03:47:13PM +0400, Lev Serebryakov wrote:
> > >> Hello, Luigi.
> > >> You wrote 14 ?????????????? 2013 ??., 14:21:09:
> > >>
> > >> LR> Then the problem remains that we should keep a copy of route and
> > >> LR> arp information in the socket instead of redoing the lookups on
> > >> LR> every single transmission, as they consume some 25% of the time
> > >> of LR> a sendto(), and probably even more when it comes to large tcp
> > >> LR> segments, sendfile() and the like.
> > >>    And we should invalidate this info on ARP/route changes, or
> > >> connection will be lost in such cases, am I right?.. So, on each
> > >> such event code should look into all sockets and check, if
> > >> routing/ARP information is still valid for them. Or we should store
> > >> lists of sockets in routing and ARP tables... I don't know, what is
> > >> worse.
> > >
> > > I think we should start by acknowledging that routing and ARP
> > > information is inherently stale, and changes unfrequently.
> > > So it is not a disaster if we have incorrect information for some
> > > short amount of time (milliseconds) because in the end the remote
> > > party that decides to change it and inform us may take much longer
> > > than that to distribute the update.
> >
> > You can save rte&arp, however doing this
> > gives you perfect chance to crash your kernel if egress interface is
> > destroyed (like vlan or ng or tun).
>
> I hope I learned not to follow a stale ifp pointer :)
> anyways ARP is really just the mac address so there is no
> dandling pointer issue.
>
> For the ifp associated to the route,
> i do not see a huge problem in marking the route/ifp as
> zombie and destroy it when the last reference goes away.

FWIW, apparently we already have that infrastrucure in place - if_rele() 
calls if_free_internal() only when the last reference to the ifnet is 
dropped, so with little care this should be usable for caching ifp pointers 
w/o fears for kernel crashes mentioned above.

Marko

> Not that the current way is any better -- you need to lock/unlock
> the rte while you do the lookup, and hold a refcount to the ifp
> until the packet is queued. So how does my suggestion make
> things worse ?
>
> cheers
> luigi
>
> > > Considering that each lookup takes between 100..300ns if you are
> > > lucky (not many misses, relatively empty table etc.), one could
> > > reasonably do the lookup at most once per millisecond or so (just
> > > reading 'ticks', no need for a nanotime() if you have a slow clock),
> > > or whenever we get an error related to the socket, either in the
> > > forward path (e.g. ifp points to an interface that is down) or in
> > > the reverse path (e.g. a dupack because we sent a packet to the
> > > wrong place).
> >
> > This sounds like "Hey, the kernel lookup is slow (which is true), let's
> > make a hack and don't bother lookups".
> > This approach gives us mtx-locked rte refcounts which are used
> > (misused) in many places making things worse and decreasing the ability
> > to fix the things up..
> >
> > > cheers
> > > luigi
> > > _______________________________________________
> > > freebsd-net at freebsd.org mailing list
> > > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > > To unsubscribe, send any mail to
> > > "freebsd-net-unsubscribe at freebsd.org"
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"