Strange BGP/OSPF problem

Thodoris S. grand at mindless.gr
Thu May 20 13:01:11 UTC 2010


an one of my border router ospf started flapping without reason the link was ok i have tested it
the log file shows the following
at the beggining started with this messages
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.213.128.0/17: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.214.0.0/17: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.214.128.0/17: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.215.0.0/17: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.215.128.0/17: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.216.0.0/15: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.218.0.0/16: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.219.0.0/16: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.220.0.0/15: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.240.0.0/13: rtm_write() unexpectedly returned -4 for command RTM_DELETE
2010/05/01 17:36:25 ZEBRA: kernel_rtm_ipv4: 222.246.191.0/24: rtm_write() unexpectedly returned -4 for command RTM_DELETE

Later OSPF started to flap after this i restarted quagga deamon and the ospf continued to flap with this messages
2010/05/18 18:42:19 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 43227ms (cpu time 0ms)
2010/05/18 18:42:28 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 9140ms (cpu time 0ms)
2010/05/18 18:42:59 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 30557ms (cpu time 0ms)
2010/05/18 18:42:59 OSPF: SLOW THREAD: task ospf_write (800670920) ran for 19783ms (cpu time 0ms)
2010/05/18 18:44:31 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 92158ms (cpu time 0ms)
2010/05/18 18:45:17 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 46164ms (cpu time 0ms)
2010/05/18 18:45:17 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 119622ms (cpu time 372ms)
2010/05/18 18:45:28 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 11152ms (cpu time 0ms)
2010/05/18 18:47:56 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 147739ms (cpu time 0ms)
2010/05/18 18:47:56 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 158815ms (cpu time 373ms)
2010/05/18 18:48:44 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 47649ms (cpu time 0ms)
2010/05/18 18:50:39 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 115218ms (cpu time 0ms)
2010/05/18 18:50:39 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 162774ms (cpu time 372ms)
2010/05/18 18:51:23 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 44371ms (cpu time 0ms)
2010/05/18 18:52:19 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 55878ms (cpu time 0ms)
2010/05/18 18:52:20 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 100162ms (cpu time 378ms)
2010/05/18 18:55:27 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 187961ms (cpu time 60ms)
2010/05/18 18:55:27 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 187858ms (cpu time 369ms)
2010/05/18 18:55:49 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 22402ms (cpu time 0ms)
2010/05/18 18:55:49 OSPF: SLOW THREAD: task ospf_write (800670920) ran for 20494ms (cpu time 0ms)
2010/05/18 18:55:50 BGP: SLOW THREAD: task bgp_scan_timer (439890) ran for 20924ms (cpu time 373ms)
2010/05/18 18:55:59 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 7289ms (cpu time 0ms)
2010/05/18 18:56:32 ZEBRA: SLOW THREAD: task work_queue_run (8006a2b90) ran for 31641ms (cpu time 0ms)

then i rebooted the machine and all worked perftectly

The Setup is as follows: 
2 FreeBSD 8.0 routers with 4 interfaces each
Each router has 2 layer 2 links (no load balancing just redudancy)

i will try to illustrate it with ascii
						Lo0							Lo1
PROVIDER-----------------------------------------------------------------------------------
		          | Link1			|Link2			|Link3			|Link4 
			  |				|				|				|
			  |				|				|				|
		           \      			/				\				/
                            em0             em1                                 em0               em1
			         FBSD0	em2------------------------------em2 FBSD1
					|								|
					em3 carp						em3 carp


FBSD0:
Link 1 OSPF cost 10
Link 2 OSPF cost 20
eBGP Provider lo0 with FBSD0 Lo1 (LocalPref 120 on incoming)(eBGP multihop)
iBGP with FBSD1 (next hop self)
and CARP for lan interface

FBSD1:
Link3 OSPF cost 30
Link4 OSPF cost 40
eBGP with Provider lo1 to FBD1 Lo1 (LocalPref 80 on incoming)(eBGP multihop)
iBGP with FBSD0 (next hop self)
and CARP for lan interface




Any idea why this is happening? these logs all generated at FBSD1 (backup router) FBSD0 working well.


More information about the freebsd-net mailing list