kern/158066: ipfw + netgraph + multicast = multicast packets transmitted over wrong interface

Mon Jun 20 10:30:12 UTC 2011

>Number:         158066
>Category:       kern
>Synopsis:       ipfw + netgraph + multicast = multicast packets transmitted over wrong interface
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 20 10:30:11 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Oleg Cherevko
>Release:        FreeBSD 8.2-RELEASE
>Organization:
>Environment:
FreeBSD tgw.icyb.net.ua 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Fri Feb 18 02:24:46 UTC 2011     root at almeida.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386
>Description:
When trying ipfw + ng_netflow combination in lab test environment I got weird effect of multicast packets being transmitted over wrong interface after going through ng_ipfw + ng_netflow. This effect breaks quagga OSPF, because OSPF HELLO packets are sent over the wrong interface, so OSPF adjacencies are not formed properly.
It turned out that ng_netflow has nothing to do with this effect, so below I describe rather simplified setup with packets going through ng_ipfw only. 

The host is running GENERIC kernel with the following modules:

#kldstat
Id Refs Address    Size     Name
 1   12 0xc0400000 bd97b4   kernel
 2    2 0xc63cc000 11000    ipfw.ko
 3    1 0xc63dd000 d000     libalias.ko
 4    3 0xc668b000 b000     netgraph.ko
 5    1 0xc669b000 4000     ng_socket.ko
 6    1 0xc669f000 2000     ng_ipfw.ko

The list of network interfaces that are UP (and relevant to the described problem):

#ifconfig -au
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
        ether 00:19:d1:24:76:c4
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x6
        inet6 ::1 prefixlen 128
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
vlan2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=3<RXCSUM,TXCSUM>
        ether 00:19:d1:24:76:c4
        inet 192.168.1.141 netmask 0xfffffff0 broadcast 192.168.1.143
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 2 parent interface: em0
vlan3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=3<RXCSUM,TXCSUM>
        ether 00:19:d1:24:76:c4
        inet 10.1.0.14 netmask 0xfffff000 broadcast 10.1.15.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        vlan: 3 parent interface: em0
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536 

quagga OSPF configuration excerpt:

router ospf
 ospf router-id 192.168.1.141
 redistribute connected metric 25 metric-type 1 route-map CONN
 redistribute static metric 30 metric-type 1 route-map NDR
 passive-interface default
 no passive-interface vlan2
 network 192.168.1.128/28 area 0.0.0.0 

As one can see, vlan2 is the only OSPF-related interface, quagga ospfd does send its HELLO and some other packets to 224.0.0.5 and 224.0.0.6 multicast IPs over vlan2. Everything works as it should:

#tcpdump -n -i vlan2 proto ospf and src host 192.168.1.141
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan2, link-type EN10MB (Ethernet), capture size 96 bytes
09:58:51.151532 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
09:58:57.345173 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Ack, length 44
09:59:01.152235 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
09:59:11.152865 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
09:59:21.153555 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
09:59:28.286692 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Ack, length 64
09:59:31.154143 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
^C
7 packets captured
78 packets received by filter
0 packets dropped by kernel 

Kernel forwarding table is pretty simple before quagga startup (IPv6 part is left out by me, it is irrelevant for the problem described):

#netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags    Refs      Use  Netif Expire
default            10.1.0.1           UGS         6       45  vlan3
10.1.0.0/20        link#8             U           1       49  vlan3
10.1.0.14          link#8             UHS         0        0    lo0
127.0.0.1          link#6             UH          0        2    lo0
192.168.1.128/28   link#7             U           2       54  vlan2
192.168.1.141      link#7             UHS         0        0    lo0

After quagga startup, it populates the kernel forwarding table with certain number of additional routes that it gets via OSPF. The exact contents of those routes is not important. What IS important, however, is that the default route IS NOT changed by quagga and keeps pointing to vlan3 interface at 10.1.0.1 gateway all the time.

The ipfw configuration is one of the simplest possible, it is configured as "open" from standard /etc/rc.firewall script (one_pass is turned off):

#ipfw list
00100 allow ip from any to any via lo0
00200 deny ip from any to 127.0.0.0/8
00300 deny ip from 127.0.0.0/8 to any
00400 deny ip from any to ::1
00500 deny ip from ::1 to any
00600 allow ipv6-icmp from :: to ff02::/16
00700 allow ipv6-icmp from fe80::/10 to fe80::/10
00800 allow ipv6-icmp from fe80::/10 to ff02::/16
00900 allow ipv6-icmp from any to any ip6 icmp6types 1
01000 allow ipv6-icmp from any to any ip6 icmp6types 2,135,136
65000 allow ip from any to any
65535 deny ip from any to any

#sysctl net.inet.ip.fw.one_pass
net.inet.ip.fw.one_pass: 0

So far everything works as expected.
Now netgraph enters the scene:

#ngctl connect ipfw: ipfw: 1 2
#ngctl l -l
There are 2 total nodes:
  Name: ipfw            Type: ipfw            ID: 00000001   Num hooks: 2
  Local hook      Peer name       Peer type    Peer ID         Peer hook
  ----------      ---------       ---------    -------         ---------
  2               ipfw            ipfw         00000001        1
  1               ipfw            ipfw         00000001        2
  Name: ngctl1894       Type: socket          ID: 00000006   Num hooks: 0

The above simply creates ng_ipfw "loopback" that just returns packet sent to its hook back into ipfw without tampering with it in any way. (Well, in practice this configuration is meaningless, as it simply does nothing with the traffic, but to illustrate my point it turns out to be sufficient)

Then we add a simple ipfw rule:
ipfw add 10 netgraph 1 proto ip

..and get multicast OPSF packets generated by this host's quagga completely disappear from vlan2!

The packets do get into netgraph and do reappear from there as is obvious from counters for rules 10 and 65000 being in perfect sync: 
#ipfw -t show 10 65000
00010  334  41895 Tue Jun 14 10:47:54 2011 netgraph 1 ip from any to any
65000 8358 855211 Tue Jun 14 10:47:54 2011 allow ip from any to any
#ipfw -t show 10 65000
00010  345  42991 Tue Jun 14 10:47:56 2011 netgraph 1 ip from any to any
65000 8369 856307 Tue Jun 14 10:47:56 2011 allow ip from any to any
#ipfw -t show 10 65000
00010  358  44243 Tue Jun 14 10:47:58 2011 netgraph 1 ip from any to any
65000 8382 857559 Tue Jun 14 10:47:58 2011 allow ip from any to any 

However, there are no more OSPF HELLOs from 192.168.1.141 in vlan2, so OSPF breaks, and the learned routes get removed from the kernel forwarding table.
What is even more interesting, those disappeared HELLO packets do reappear... (surprise! surprise!)... in vlan3! Please note, that vlan3 has nothing to do with OSPF at all! Here how it looks:

#tcpdump -n -i vlan3 proto ospf and src host 192.168.1.141
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vlan3, link-type EN10MB (Ethernet), capture size 96 bytes
10:30:00.536356 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
10:30:04.221026 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Ack, length 44
10:30:10.543039 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
10:30:11.399434 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Ack, length 44
10:30:11.409279 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Update, length 64
10:30:12.430221 IP 192.168.1.141 > 224.0.0.6: OSPFv2, LS-Ack, length 44
10:30:20.553881 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
10:30:30.583664 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
10:30:40.599868 IP 192.168.1.141 > 224.0.0.5: OSPFv2, Hello, length 76
^C
9 packets captured
60 packets received by filter
0 packets dropped by kernel

>From reading the ng_ipfw.c source it looks like the problem is caused by the code that reintroduces out-packet back into ipfw. It is done via calling ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL). It looks like ip_output(), when called like this, simply routes the packet destined to 224.0.0.X multicast address as if it was normal unicast-destined packet, and since we have no specific routing entries for multicast addresses in the kernel forwarding table it simply follows the default route and ends up in vlan3 (which is wrong).

>How-To-Repeat:
Since the problem description involves quagga OSPF that is not easy to setup (but helps illustrate why this problem is important), I'd suggest using much simpler mctest tool from /usr/src/tools/tools/mctest/ to repeat the problem.

We would need a multihomed host with at least 2 network interfaces, there should be a default route in the kernel forwarding table.

The ipfw should be either loaded as module or compiled in. The ipfw configuration should be as simple as possible ("open" from /etc/rc.firewall will do). one_pass should be disabled:
ipfw disable one_pass

The following netgraph modules should be loaded (they also may be compiled in):
kldload netgraph.ko
kldload ng_socket.ko
kldload ng_ipfw.ko

ng_ipfw "loopback" should be created:
ngctl connect ipfw: ipfw: 1 2

The rule to pass test multicast traffic through this loopback should be added to ipfw:
ipfw add 1 netgraph 1 ip from any to 224.0.0.129

Now, we need to define 2 network interfaces: "default" (where default route is pointing) and "non-default" (any other configured up interface except lo0)
DIF="<your default interface name here>"
NDIF="<your non-default interface name here>"

Now, some multicast traffic to non-default interface ${NDIF} should be generated while watching with tcpdump on both, default and non-default interfaces (all in different shells, of course):
tcpdump -n -i ${NDIF} dst net 224.0.0.128/30
tcpdump -n -i ${DIF} dst net 224.0.0.128/30

mctest -i ${NDIF} -g 224.0.0.130 -n 3
(this traffic skips ng_ipfw, so it ends up in ${NDIF}, as it should)

mctest -i ${NDIF} -g 224.0.0.129 -n 3
(this traffic goes through ng_ipfw, it ends up in ${DIF}, which is wrong)

>Fix:
Currently I'm unaware of any fix to this problem.
However a workaround can be suggested.
If one needs to put traffic through ng_ipfw to netgraph, one should either use "ipfw ngtee" instead of "ipfw netgraph" or do not put multicast traffic through ng_ipfw at all. Speaking about my original intention of putting all traffic via ng_netflow, I currently use the following workaround setup:

ngnetflow_dst="10.1.0.10:9996"
ngctl mkpeer ipfw: netflow 1 iface0
ngctl name ipfw:1 netflow 1
ngctl connect ipfw: netflow: 101 out0
ngctl connect ipfw: netflow: 2 iface1
ngctl msg netflow: setconfig { iface=0 conf=7 }
ngctl msg netflow: setconfig { iface=1 conf=7 }
ngctl msg netflow: setdlt { iface=0 dlt=12 }
ngctl msg netflow: setdlt { iface=1 dlt=12 }
ngctl msg netflow: setimeouts { inactive=15 active=60 }
ngctl mkpeer netflow: ksocket export inet/dgram/udp
ngctl name netflow:export flowexp
ngctl msg flowexp: connect inet/${ngnetflow_dst}

ipfw add 10 netgraph 1 ip from any to any in
ipfw add 20 netgraph 1 ip from any to not 224.0.0.0/4 out
ipfw add 21 ngtee 2 ip from any to 224.0.0.0/4 out

instead of this:

ngnetflow_dst="10.1.0.10:9996"
ngctl mkpeer ipfw: netflow 1 iface0
ngctl name ipfw:1 netflow 1
ngctl connect ipfw: netflow: 101 out0
ngctl msg netflow: setconfig { iface=0 conf=7 }
ngctl msg netflow: setdlt { iface=0 dlt=12 }
ngctl msg netflow: setimeouts { inactive=15 active=60 }
ngctl mkpeer netflow: ksocket export inet/dgram/udp
ngctl name netflow:export flowexp
ngctl msg flowexp: connect inet/${ngnetflow_dst}

ipfw add 10 netgraph 1 ip from any to any

The workaround setup works OK without disturbing quagga OSPF.

>Release-Note:
>Audit-Trail:
>Unformatted: