[patch] interface routes

Alexander V. Chernikov melifaro at FreeBSD.org
Thu Mar 7 15:35:49 UTC 2013


On 07.03.2013 17:51, Andre Oppermann wrote:
> On 07.03.2013 14:38, Ermal Luçi wrote:
>> On Thu, Mar 7, 2013 at 12:55 PM, Andre Oppermann <andre at freebsd.org
>> <mailto:andre at freebsd.org>> wrote:
>>
>>     On 07.03.2013 12:43, Alexander V. Chernikov wrote:
>>
>>         On 07.03.2013 11:39, Andre Oppermann wrote:
>>
>>             On 07.03.2013 07:34, Alexander V. Chernikov wrote:
>>
>>                 Hello list!
>>
>>                 There is a known long-lived issue with interface routes
>>                 addition/deletion:
>>
>>                 ifconfig iface inet 1.2.3.4/24 <http://1.2.3.4/24> can
>> fail if given prefix is
>>                 already in
>>                 kernel route table (for
>>                 example, advertised by IGP like OSPF).
>>
>>                 Interface route can be deleted via route(8) or any
>> route socket user
>>                 (sometimes this happens with
>>                 popular opensource daemons like bird/quagga).
>>
>>                 Problem is reported at least in kern/106722 and
>> kern/155772.
>>
>>
>>             You patch is a welcome addition.
>>
>>                 This can be fixed the following way:
>>                 Immutable route flag (RTM_PINNED, added in 19995 with
>> 'for future use'
>>                 comment) is utilised to mark
>>                 route 'immutable'.
>>                 rtrequest1_fib refuses to delete routes with given
>> flag unless
>>                 RTM_PINNED is set in rti_flags.
>>
>>
>>             How do the routing daemons react to being unable to
>> change/delete
>>             such a route?
>>
>>         routing daemons live long with the fact that there route
>> socket cmds can
>>         fail (and the is route(8) utility which can do anything), so
>> typically
>>         bird/quagga yells like
>>         'bird: KRT: Error sending route 11.0.0.0/24
>> <http://11.0.0.0/24> to kernel: File exists'
>>         and marks given route as not installed in internal RIB.
>> Additionally,
>>         daemon will probably re-try to insert such routes on every
>> periodic KRT
>>         rescan (tens of minutes).
>>
>>
>>
>> Isn't it better to teach the routing code about metrics.
>> Routing daemons cope better this way and they can handle this.
>> So the policy of this behaviour can be controled by administrator
>> rather than by code!
>> With metrics you can add routes with bigger metric for interfaces and
>> lower from routing daemons.
>> This also can mitigate somehow on interfaces with the same subnet
>> configured possibly.
> 
> Generally I agree with you that this would be the ideal outcome.
> However we're still quite a bit away from reaching that goal.
> To make this really work we have make mpath plus metrics a first
> class citizen in the routing code and also the update the routing
> daemons kernel interfaces to know about this.  I hope we get there
> in the not too distant future.
Radix is already over-bloated. Typically in performance-oriented
solutions (hardware/software routers from vendors) there is clear
separation between RIB (where route protocol attributes, best candidate
routes, routes with different priority exists) and FIB, which is
typically some kind of radix with minimum needed info, e.g:
prefix, nexthops, their interfaces, optional L2 data to prepend.

Our radix stands somewhere between RIB and FIB (since we have to support
route(8) and upper layer protocols): it serves badly as RIB (little
functionality) and as FIB: too much overhead and inefficient/too general
code.

For example, sizeof(rt_nodes[2]) (first element of rte) is 96 bytes on
amd64.

Additionally, rte refcount approach is totally broken.

I'm currently thinking of adding some kind of hooks to current
route/radix code to permit building efficient trie (or other structure)
for given address family and to use it for forwarding purposes only.

For example, I don't need trie while doing MPLS label switching:
assuming control plane allocates contiguous label space, I can use label
array for efficient lookup.


> 
> As a first step I think it is important that Alexanders patch goes
> in to fix a long standing and very annoying problem with the code
> we have.  Also the link down route withdraw should be added asap.
> Then we can take the next steps towards the ultimate goal you describe.
> 
> I hope you do not object to Alexanders patch?
> 


-- 
WBR, Alexander


More information about the freebsd-net mailing list