Initial review request for IPv6 Fast Forwarding and IP6STEALTH

James haesu at towardex.com
Mon Nov 15 14:23:16 PST 2004


Folks,

Attached is initial code for ip6_fastforward() that I'm proposing for FreeBSD
5.x. This code was written for an internally modified FreeBSD 4.9, however in
the next few weeks, I will be porting this into FreeBSD 5.3 tree and submit a
final draft for review back to freebsd-net here. However in the mean time, if
any experienced folks can feed any suggestions or critics for this code, I will
gladily appreciate your input and make necessary changes for the final draft.

We have been testing this code on a core router in occaid.org IPv6 network for
a few days now, and so far we've had zero problems and so far is running very
stable.

Few notes:

	o The code was again, made for 4.x, so currently does not use pfil.
	  However, final draft that will be submitted by me later will include
	  pfil_hooks.

	o In our internally modified 4.x kernel (where this code was written for
	  initially), packets destined to router itself is sent to lo0 interface

	  Therefore we do not have any checks for "is packet destined to us"
	  in this code, however it is very simple to fix in KAME. I will make
	  this correction implemented in the final draft.

Thank you for your time and suggestions in the mean time.

-J
-- 
James Jun                                            TowardEX Technologies, Inc.
Technical Lead                       IPv4 and Native IPv6 Colocation, Bandwidth,
james at towardex.com             and Web Hosting Services in the Metro Boston area
cell: 1(978)-394-2867           web: http://www.towardex.com , noc: www.twdx.net
-------------- next part --------------
/*
 * Copyright (c) 2004 James Jun <james at towardex.com>. All rights reserved.
 * Copyright (c) 2004
 *      TowardEX Technologies International, Inc.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the project nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 * $Wolfowitz: apc4-snap/49stable/src/sys/netinet6/ip6_fastfwd.c,v 1.0.5 2004/11/11 03:17:05 blahdy Exp $
 * $Wolfowitz: apc5-snap/53L/src/sys/netinet6/ip6_fastfwd.c,v 1.0.5 2004/11/11 03:17:05 blahdy Exp $
 */

/*
 * ip6_fastforward() is derived from FreeBSD 5.3's fully featured IPv4
 * fast forwarding code.
 *
 * The ip6_fastforward() gains its speed by bypassing queuing and NETISR
 * completely. After packet is DMA'd from the incoming network card to
 * the host memory, the upper level interface driver preemptively calls
 * ip6_fastforward() instead of scheduling for later processing through
 * the network interrupt service. We then perform firewall validation
 * and packet validations as required by the protocol RFC's (e.g. drop
 * bad packets) for unicast IPv6 forwarding. We also internally resolve
 * IPv6 Neighbor Discovery instead of relying on nd6_output, and directly
 * send the packet off to the outgoing interface which DMAs the packet
 * to the network card. The only part of the packet we touch is simply
 * the IPv6 header.
 *
 * Because IPv6 patricia trie for kernel RIB can be up to as bad as O(128)
 * lookup constant, we leave the route structure opened in memory after
 * we perform the route lookup. If the next arriving packet uses the
 * same route, we will continue using that structure. If the next
 * packet uses a different destination, then we will free that structure
 * and perform a new RIB lookup. Some may call this route cache, but
 * so far, as little as I can test, it appears running a simple "if"
 * check to see if we can reuse a route is cheaper than rerunning
 * rtalloc_ign even possibly through L2 cache. Routing latency appears
 * to slightly increase by tens of microseconds when forcefully running
 * rtalloc_ign() for each fastforward call.
 *
 * Everything we don't know how to handle will be passed down to ip6_input()
 * for full processing. This includes multicast and IPsec routing. If this
 * is an IPsec tunneling broker, you should not use fastforward as we do
 * not support it. Otherwise, you can continue to use IPsec as encrypted
 * packets are destined to local system, which are processed by ip6_input.
 *
 * The hop-by-hop options in IPv6 is not supported by fast forwarding.
 * Packets with HBH will be sent to ip6_input for full processing.
 *
 * The performance constraint in using ip6_fastforward is essentially
 * limited by the following conditions:
 *
 *   o Your bus speed.
 *   o How fast your CPU and RAM can walk patricia trie for 128bit route
 *     lookup. L2 cache can probably help here.
 *   o The speed and efficiency of your network card/driver to quickly
 *     setup receives and transmits.
 *   o The complexity of your firewall rules, if you are using firewall.
 *     The performance constraint caused by firewall configuration is
 *     totally up to you, and how you setup the rules.
 *
 */

/*
 * Many thanks to Matt Thomas from NetBSD and Andre Oppermann from FreeBSD,
 * where this fastforwarding implementation was derived from. And lastly,
 * many thanks to the KAME team for their superior IPv6 stack. None of
 * this could've been possible without any of these folks' work.
 *
 */

#include "opt_ip6fw.h"
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_ip6stealth.h"

#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/sysctl.h>
#include <sys/syslog.h>

#include <net/if.h>
#include <net/if_types.h>
#include <net/if_var.h>
#include <net/if_dl.h>
#include <net/route.h>

#include <netinet/in.h>
#include <netinet/in_systm.h>
#include <netinet/in_var.h>
#include <netinet/ip.h>
#include <netinet/ip6.h> /* ip6.h must be declared before ip6_var obviously! */
#include <netinet/ip_var.h>

#include <netinet6/nd6.h>
#include <netinet6/ip6_var.h>
#include <netinet6/ip6_fw.h>

#include <netinet/icmp6.h>


SYSCTL_DECL(_net_inet6_ip6);
static int	ip6fastforward_active = 0;
SYSCTL_INT(_net_inet6_ip6, IPV6CTL_FASTFORWARDING, fastforwarding, CTLFLAG_RW,
	&ip6fastforward_active, 0, "Enable fast IPv6 forwarding");

#ifdef IP6STEALTH
static int	ip6stealth = 0;
SYSCTL_INT(_net_inet6_ip6, OID_AUTO, stealth, CTLFLAG_RW,
	&ip6stealth, 0, "Enable stealth IPv6 forwarding");
#endif

struct   route_in6 ip6_forward_rt;

int
ip6_fastforward(struct mbuf *m)
{
	struct ip6_hdr *ip6;
	struct sockaddr_in6 *dst;
	struct sockaddr_in6 *gw6 = NULL;
	struct rtentry *rt = NULL;
	struct ifnet *ifp = NULL;
	int error = 0;
	u_int32_t plen;
	struct llinfo_nd6 *ln = NULL;

	/* Are we hot and forwarding IPv6? If not, drop to ip6_input */
	if (!ip6fastforward_active || !ip6_forwarding)
	  return 0;

        KASSERT(m != NULL && (m->m_flags & M_PKTHDR) != 0,
            ("apc_inet6_fastfwd: no packet header in mbuf"));

        /*
         * Update mbuf statistics
         */
        if (m->m_flags & M_EXT) {
                if (m->m_next)
                        ip6stat.ip6s_mext2m++;
                else
                        ip6stat.ip6s_mext1++;
        } else {
#define M2MMAX  (sizeof(ip6stat.ip6s_m2m)/sizeof(ip6stat.ip6s_m2m[0]))
                if (m->m_next) {
                        if (m->m_flags & M_LOOP) {
                                ip6stat.ip6s_m2m[loif[0].if_index]++;   /* XXX */
                        } else if (m->m_pkthdr.rcvif->if_index < M2MMAX)
                                ip6stat.ip6s_m2m[m->m_pkthdr.rcvif->if_index]++;
                        else
                                ip6stat.ip6s_m2m[0]++;
                } else
                        ip6stat.ip6s_m1++;
#undef M2MMAX
        }

#ifndef PULLDOWN_TEST
	/*
	 * We have a serious problem if we don't check for invalid
	 * mbuf chains that sometimes bad drivers or layer2 code
	 * send us. Correct as needed.
	 */
        if (m && m->m_next != NULL && m->m_pkthdr.len < MCLBYTES) {
                struct mbuf *n;

                MGETHDR(n, M_DONTWAIT, MT_HEADER);
                if (n)
                        M_MOVE_PKTHDR(n, m);
                if (n && n->m_pkthdr.len > MHLEN) {
                        MCLGET(n, M_DONTWAIT);
                        if ((n->m_flags & M_EXT) == 0) {
                                m_freem(n);
                                n = NULL;
                        }
                }
                if (n == NULL) {
                        m_freem(m);
                        return 1; /*ENOBUFS*/
                }

                m_copydata(m, 0, n->m_pkthdr.len, mtod(n, caddr_t));
                n->m_len = n->m_pkthdr.len;
                m_freem(m);
                m = n;
        }
        IP6_EXTHDR_CHECK(m, 0, sizeof(struct ip6_hdr), 1);
#endif

	/*
	 * Step 1: check for packet drop conditions and sanity checks
	 */

	/* 
	 * First mbuf large enough for ipv6 header and is header there?
	 */
	if (m->m_len < sizeof(struct ip6_hdr) &&
  	   (m = m_pullup(m, sizeof(struct ip6_hdr))) == 0) {
		ip6stat.ip6s_toosmall++;
		in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_hdrerr);
		return 1; /* mbuf already free'd */
	}

	ip6 = mtod(m, struct ip6_hdr *);

	/* Is this IPv6? */
	if ((ip6->ip6_vfc & IPV6_VERSION_MASK) != IPV6_VERSION) {
		ip6stat.ip6s_badvers++;
		in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_hdrerr);
		goto drop;
	}

	plen = (u_int32_t)ntohs(ip6->ip6_plen);

        /*
	 * Is entire packet big enough as the IPv6 header
	 * tells us?
         */
        if (m->m_pkthdr.len - sizeof(struct ip6_hdr) < plen) {
          ip6stat.ip6s_tooshort++;
          goto drop;
        }

	/*
	 * If entire packet is bigger than what the IPv6
	 * header tells us, cut the size.
	 */
        if (m->m_pkthdr.len > sizeof(struct ip6_hdr) + plen) {
          if (m->m_len == m->m_pkthdr.len) {
              m->m_len = sizeof(struct ip6_hdr) + plen;
              m->m_pkthdr.len = sizeof(struct ip6_hdr) + plen;
          } else
              m_adj(m, sizeof(struct ip6_hdr) + plen - m->m_pkthdr.len);
        }

	/* Record what kind of extension header each packet has.. */
	ip6stat.ip6s_nxthist[ip6->ip6_nxt]++;

	/*
	 * Bad-Address Check
	 * Packets we should not be accepting should be dropped now.
	 */

        /*
         * The following check is not documented in specs.  A malicious
         * party may be able to use IPv4 mapped addr to confuse tcp/udp stack
         * and bypass security checks (act as if it was from 127.0.0.1 by using
         * IPv6 src ::ffff:127.0.0.1).  Be cautious.
         *
         * This check chokes if we are in an SIIT cloud.  As none of BSDs
         * support IPv4-less kernel compilation, we cannot support SIIT
         * environment at all.  So, it makes more sense for us to reject any
         * malicious packets for non-SIIT environment, than try to do a
         * partical support for SIIT environment.
         */
        if (IN6_IS_ADDR_V4MAPPED(&ip6->ip6_src) || IN6_IS_ADDR_V4MAPPED(&ip6->ip6_dst) ||
	/* 
	 * If we are seeing loopback addresses on the wire, I think the 
	 * cluephone is ringing!
	 */
         IN6_IS_ADDR_LOOPBACK(&ip6->ip6_src) || IN6_IS_ADDR_LOOPBACK(&ip6->ip6_dst) ||
	/*
	 * Drop packets with unspecified src/dst pair, and drop packets claiming
	 * to be from multicast networks.
	 *
	 * Note that if the source addr is multicast, that is not exactly a
	 * conforming multicast, so drop it.
	 * [See "Developing IP Multicast Networks", Beau Williamson, 
	 *  Cisco Press]
	 */
	 IN6_IS_ADDR_MULTICAST(&ip6->ip6_src) || IN6_IS_ADDR_UNSPECIFIED(&ip6->ip6_dst) ||
	 IN6_IS_ADDR_UNSPECIFIED(&ip6->ip6_src))
        {
          ip6stat.ip6s_badscope++;
	  in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_addrerr);
	  goto drop;
	}


	/*
	 * Step 2: fallback conditions to ip6_input slowpath processing
	 */

        /*
	 * Regular unicast IPv6 packets only. Multicast and packets claiming
	 * to be from loopback, we don't know how to deal with, so we will bail
	 * out and pass them onto ip6_input().
	 *
	 * Further, people who forward link-local subnets with a routing 
	 * protocol must go back and take 'IPv6 for Dummies' course. We won't
	 * drop them, but in spirit of RFC2772, we won't fast process them
	 * either.
         */
        if (IN6_IS_ADDR_MULTICAST(&ip6->ip6_dst) || IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_src) ||
	    IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_dst) || (m->m_pkthdr.rcvif->if_flags & IFF_LOOPBACK) ||
	    (m->m_flags & (M_BCAST|M_MCAST)) != 0 ) 
		return 0;

	/*
	 * If we are seeing a packet with a Router Alert Option [RFC2711] now,
	 * and we are still in ip6_fastforward(), then this packet really is NOT
	 * multicast, nor regular looking unicast. It's either RSVP or something
	 * else not supported. Send to ip6_input() for full inspection.
	 */
	if (ip6->ip6_nxt == IPPROTO_HOPOPTS)
		return 0;

	/* Step 3:
	 * Lookup route in the radix rib.
	 *
	 * We need to look up the route early, because we
	 * need to validate whether we have a neighbor discovery
	 * cache available in the respective rtentry.
	 */
	dst = (struct sockaddr_in6 *)&ip6_forward_rt.ro_dst;
	if ((ip6_forward_rt.ro_rt == 0) || 
		!IN6_ARE_ADDR_EQUAL(&ip6->ip6_dst, &dst->sin6_addr)){
		if(ip6_forward_rt.ro_rt) {
			RTFREE(ip6_forward_rt.ro_rt);
			ip6_forward_rt.ro_rt = 0;
		}
		bzero(dst, sizeof(*dst));
		dst->sin6_len = sizeof(struct sockaddr_in6);
		dst->sin6_family = AF_INET6;
		dst->sin6_addr = ip6->ip6_dst;
		rtalloc_ign((struct route *)&ip6_forward_rt, RTF_PRCLONING);
		if (ip6_forward_rt.ro_rt){
			rt = ip6_forward_rt.ro_rt;
	 		if (rt->rt_flags & RTF_GATEWAY)
			  dst = (struct sockaddr_in6 *)rt->rt_gateway;
			ifp = rt->rt_ifp;
		} else {
			goto forwardcheck;
		}
	} else {
		rt = ip6_forward_rt.ro_rt;
		if (rt->rt_flags & RTF_GATEWAY)
		  dst = (struct sockaddr_in6 *)rt->rt_gateway;
		ifp = rt->rt_ifp;
	}

	/* If the destination neighbor is in process of link layer
	 * resolution, drop to ip6_input now.
	 *
	 * [See RFC 2461 Sec 7.2.2 ]
	 *
	 * Because we bypassed NETISR registration and queueing
	 * under locking protection, if we just blindly attempt
	 * to queue up small amount of packets while neighbor is
	 * being resolved, we will corrupt or otherwise lose
	 * mbuf's. This may lead device driver to crash, exhaust,
	 * or otherwise crash the kernel.
	 *
	 * ip6_input requires packets to arrive in more of an
	 * orderly fassion (as a tradeoff for slower performance)
	 * so that mbuf corruption does not occur during neighbor
	 * link layer resolution.
	 */
	if (rt->rt_flags & RTF_UP){
		if (rt->rt_flags & RTF_LLINFO){
			ln = (struct llinfo_nd6 *)rt->rt_llinfo;
			goto nd_cache_lookup;
		}

		if (nd6_is_addr_neighbor(dst, ifp) && 
		    (rt == nd6_lookup(&dst->sin6_addr, 1, ifp)) != NULL){
			ln = (struct llinfo_nd6 *)rt->rt_llinfo;
			goto nd_cache_lookup;
		}
	nd_cache_lookup:
		if (ln && (ln->ln_state == (ND6_LLINFO_INCOMPLETE|ND6_LLINFO_NOSTATE)))
	    	  return 0;
	}

	in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_receive);
	ip6stat.ip6s_total++;


	/*
	 * Step 4: incoming packet firewall processing
	 * XXX: Need to use pfil_hooks for APC5-dev
	 */

forwardcheck:
	/* very basic ip6fw check. no divert, no dummynet, no
	 * nothing. just permit/deny only for now.
	 */
        if (ip6_fw_enable && ip6_fw_chk_ptr) {
                u_short port = 0;
                /* If ipfw says divert, we have to just drop packet */
                /* use port as a dummy argument */
                if ((*ip6_fw_chk_ptr)(&ip6, NULL, &port, &m)) {
			m_freem(m);
		}
		if (!m)
			return 1;
        }

	/*
	 * Step 5: Validate routing entry
	 */

	if (!rt){
		ip6stat.ip6s_noroute++;
		ip6stat.ip6s_cantforward++;
		in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_noroute);
		icmp6_error(m, ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOROUTE, 0);
		/* Do _NOT_ call RTFREE here. */
		return 1;
	}

	/* Drop all reject and null routes while we are in fast 
	 * forwarding path.
	 */
	if ( rt->rt_flags & RTF_BLACKHOLE ) 
		goto drop;

	if ( rt->rt_flags & RTF_REJECT ) {
		/*
		 * XXX I can't seem to find any BCP, STD, RFC nor any ietf v6
		 * working group drafts regarding how to respond to a reject
		 * route with ICMP6 code. So for now, we'll copycat Juniper's
		 * behaviour.
		 */
		icmp6_error(m, ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOROUTE, 0);
		return 1; /* mbuf is already gone */
	}

	/* If route is down, we shouldn't use it */
	if (!(rt->rt_flags & RTF_UP)){
		ip6stat.ip6s_cantforward++;
		in6_ifstat_inc(m->m_pkthdr.rcvif, ifs6_in_noroute);
        	icmp6_error(m, ICMP6_DST_UNREACH,
                               ICMP6_DST_UNREACH_ADDR, 0);
		return 1;
	}
		
	/*
	 * Scope check: if a packet can't be delivered to its destination
	 * for the reason that the destination is beyond the scope of the
	 * source address, return ICMP6 code 2. 
	 * [draft-ietf-ipngwg-icmp-v3-02.txt, Section 3.1]
	 */
	if (in6_addr2scopeid(m->m_pkthdr.rcvif, &ip6->ip6_src) !=
	    in6_addr2scopeid(ifp, &ip6->ip6_src)) {
		ip6stat.ip6s_cantforward++;
		ip6stat.ip6s_badscope++;
	  	in6_ifstat_inc(ifp, ifs6_in_discard);

		icmp6_error(m, ICMP6_DST_UNREACH,
		    ICMP6_DST_UNREACH_BEYONDSCOPE, 0);
		return 1;
	}

	/* Decrement Hop Limit now. if hlim=0, call icmp6_error
	 *
	 * Note that we're doing this after we have successfully
	 * gone through the "forwarding" stage by identifying a
	 * functional route; we should NOT be doing TTL/HLIM work
	 * BEFORE we even look up a usable route. At least this
	 * appears to be the way Juniper prefers doing, in their
	 * hardware forwarding plane implementation.
	 */

#ifdef IP6STEALTH
	if (!ip6stealth) {
#endif
	/*
	 * Note:
	 * Because we forward local-bound packets to lo0, it is
	 * important that we do NOT decrement HLIM on them. If
	 * we do, not only traceroute to router's interface IP
	 * will show double hops, icmp6 nd advertisements will
	 * get discarded per RFC requirement <-- this is bad.
	 *
	 * Furthermore, decrementing HLIM on local-bound packets
	 * may break strict GTSM deployments that rely on HLIM
	 * (TTL in current IPv4 BTSH deployments) to be 255 on
	 * BGP single hop peers.
	 */
	if (!(rt->rt_flags & RTF_LOCAL)){
	  if (ip6->ip6_hlim <= IPV6_HLIMDEC){
		icmp6_error(m, ICMP6_TIME_EXCEEDED,
				ICMP6_TIME_EXCEED_TRANSIT, 0);
		return 1;
	  }
	  ip6->ip6_hlim -= IPV6_HLIMDEC;
	}

#ifdef IP6STEALTH
	}
#endif


	/*
	 * Step 6: Outgoing firewall check. Again very basic.
	 * need to move to pfil_hooks for apc5
	 */
	if (ip6_fw_enable && ip6_fw_chk_ptr) {
		u_short port = 0;
		/* If ipfw says divert, we have to just drop packet */
		/* Send ifp, so fw knows this is outbound check */
		if ((*ip6_fw_chk_ptr)(&ip6, ifp, &port, &m)) {
			m_freem(m);
		}
		if (!m)
			return 1;
	}


	/*
	 * Step 7: Send off the packet
	 */

#if 0
	/*
	 * Check layer1 media link state
	 * For APC5 only.
	 */
	if (ifp->if_link_state == LINK_STATE_DOWN) {
        	icmp6_error(m, ICMP6_DST_UNREACH,
                               ICMP6_DST_UNREACH_ADDR, 0);
		return 1;
	}
#endif

	/* Check if packet is too fat to fit on the outgoing link.
	 * If it is too fat, notify the source.
	 */
	if (m->m_pkthdr.len > ifp->if_mtu) {
		u_long mtu;
		mtu = ifp->if_mtu;
		in6_ifstat_inc(ifp, ifs6_in_toobig);
		icmp6_error(m, ICMP6_PACKET_TOO_BIG, 0, mtu);
		return 1;
 	}

	/* Clear embedded scope id's as needed */
	if (IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_src))
		ip6->ip6_src.s6_addr16[1] = 0;
	if (IN6_IS_SCOPE_LINKLOCAL(&ip6->ip6_dst))
		ip6->ip6_dst.s6_addr16[1] = 0;

	/*
	 * Validate layer2 link resolution for next-hop
	 * advancing router or host.
	 */

	/* We do not deal with neighbor cache on any
	 * interface other than ARC, ethernet, FDDI
	 * and GIF.
	 */
	if (nd6_need_cache(ifp) == 0)
		goto sendpkt;

	if (rt->rt_flags & RTF_GATEWAY){
		struct rtentry *rt0 = rt;

		gw6 = (struct sockaddr_in6 *)rt->rt_gateway;

		/* Skip link-layer addr resolution and NUD
		 * if nexthop advancing router is not a
		 * neighbor according to NUD. If this is
		 * a point to point link, skip NUD.
		 */
		if (!nd6_is_addr_neighbor(gw6, ifp) ||
		    in6ifa_ifpwithaddr(ifp, &gw6->sin6_addr)) {
			if ((ifp->if_flags & IFF_POINTOPOINT) == 0)
			  goto drop;

			goto sendpkt;
		}

		if(rt->rt_gwroute == 0)
			goto ndll_lookup;
		if (((rt = rt->rt_gwroute)->rt_flags & RTF_UP) == 0) {
			rtfree(rt); rt = rt0;
		ndll_lookup: rt->rt_gwroute = rtalloc1(rt->rt_gateway, 1, 0UL);
			if ((rt = rt->rt_gwroute) == 0)
			  goto drop;
		}
	}

	if (!ln){
		if ((ifp->if_flags & IFF_POINTOPOINT) == 0 &&
		  !(nd_ifinfo[ifp->if_index].flags & ND6_IFF_PERFORMNUD)) {
			log(LOG_DEBUG,
			   "apc_inet6_fastfwd: can't allocate llinfo for %s "
			   "(ln=%p, rt=%p)\n",
			   ip6_sprintf(&dst->sin6_addr), ln, rt);
			goto drop;
		}
		goto sendpkt; /* send anyway */
	}

	/* No need for link-layer resolution on a p2p circuit */
	if ((ifp->if_flags & IFF_POINTOPOINT) != 0 &&
	    ln->ln_state < ND6_LLINFO_REACHABLE){
		ln->ln_state = ND6_LLINFO_STALE;
		ln->ln_expire = time_second + nd6_gctimer;
	}
	/* If entry is stale, change state to DELAY and let it
	 * expire in nd6.c (RFC2461 Sec. 7.3.3)
	 */
	if (ln->ln_state == ND6_LLINFO_STALE) {
		  ln->ln_asked = 0;
		  ln->ln_state = ND6_LLINFO_DELAY;
		  ln->ln_expire = time_second + nd6_delay;
	}

	if (ln->ln_state > ND6_LLINFO_INCOMPLETE)
		goto sendpkt;

	/*
	 * Read up comments above for reason why we are not
	 * queuing up packet here.
	 */
	if (ln->ln_state == ND6_LLINFO_NOSTATE)
		ln->ln_state = ND6_LLINFO_INCOMPLETE;
 	if (ln->ln_expire) {
                if (ln->ln_asked < nd6_mmaxtries &&
                    ln->ln_expire < time_second) {
                        ln->ln_asked++;
                        ln->ln_expire = time_second +
                                nd_ifinfo[ifp->if_index].retrans / 1000;
                        nd6_ns_output(ifp, NULL, &dst->sin6_addr, ln, 0);
                }
        }

	/* There is no need to process packets further. */
	goto drop;

sendpkt:
	error = (*ifp->if_output)(ifp, m, (struct sockaddr *)dst, rt);

	if (error) {
		ip6stat.ip6s_odropped++;
		in6_ifstat_inc(rt->rt_ifp, ifs6_out_discard);
	} else {
		/* if not an error, we successfully routed a packet! */
		ip6stat.ip6s_forward++;
		ip6stat.ip6s_fastforward++;
		in6_ifstat_inc(rt->rt_ifp, ifs6_out_forward);
	}

	return 1;  /* We are done, operation complete. */
drop:
	if (m)
		m_freem(m);
	return 1;
}

/* EOF */


More information about the freebsd-net mailing list