lagg/lacp poor traffic distribution

Shtorm admin at shtorm.com
Fri Dec 24 15:44:03 UTC 2010


On Tue, 2010-12-21 at 16:39 +0600, Eugene Grosbein wrote:
> On 20.12.2010 17:21, Shtorm wrote:
> > On Sun, 2010-12-19 at 00:35 +0600, Eugene Grosbein wrote:
> >> Hi!
> >>
> >> I've loaded router using two lagg interfaces in LACP mode.
> >> lagg0 has IP address and two ports (em0 and em1) and carry untagged frames.
> >> lagg1 has no IP address and has two ports (igb0 and igb1) and carry
> >> about 1000 dot-q vlans with lots of hosts in each vlan.
> >>
> >> For lagg1, lagg distributes outgoing traffic over two ports just fine.
> >> For lagg0 (untagged ethernet segment with only 2 MAC addresses)
> >> less than 0.07% (54Mbit/s max) of traffic goes to em0
> >> and over 99.92% goes to em1, that's bad.
> >>
> >> That's general traffic of several thousands of customers surfing the web,
> >> using torrents etc.  I've glanced over lagg/lacp sources if src/sys/net/
> >> and found nothing suspicious, it should extract and use srcIP/dstIP for hash.
> >>
> >> How do I debug this problem?
> >>
> >> Eugene Grosbein
> >> _______________________________________________
> >> freebsd-net at freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> > 
> > I had this problem with igb driver, and I found, that lagg selects
> > outgoing interface based on packet header flowid field if M_FLOWID field
> > is set. And in the igb driver code flowid is set as 
> > 
> > #if __FreeBSD_version >= 800000
> > <------><------><------>rxr->fmp->m_pkthdr.flowid = que->msix;
> > <------><------><------>rxr->fmp->m_flags |= M_FLOWID;
> > #endif
> > 
> > The same thing in em driver with MULTIQUEUE 
> > 
> > That does not give enough number of flows to balance traffic well, so I
> > commented check in if_lagg.c
> > 
> > lagg_lb_start(struct lagg_softc *sc, struct mbuf *m)
> > {
> > <------>struct lagg_lb *lb = (struct lagg_lb *)sc->sc_psc;
> > <------>struct lagg_port *lp = NULL;
> > <------>uint32_t p = 0;
> > 
> > //<---->if (m->m_flags & M_FLOWID)
> > //<----><------>p = m->m_pkthdr.flowid;
> > //<---->else
> > 
> > and with this change I have much better load distribution across interfaces.
> > 
> > Hope it helps.
> 
> You are perfectly right. By disabling flow usage I've obtained load sharing
> close to even (final patch follows). Two questions:
> 
> 1. Is it a bug or design problem?
> 2. Will I get problems like packet reordering by permanently disabling
> usage of these flows in lagg(4)?
> 
> --- if_lagg.c.orig	2010-12-20 22:53:21.000000000 +0600
> +++ if_lagg.c	2010-12-21 13:37:20.000000000 +0600
> @@ -168,6 +168,11 @@
>      &lagg_failover_rx_all, 0,
>      "Accept input from any interface in a failover lagg");
>  
> +int lagg_use_flows = 1;
> +SYSCTL_INT(_net_link_lagg, OID_AUTO, use_flows, CTLFLAG_RW,
> +    &lagg_use_flows, 1,
> +    "Use flows for load sharing");
> +
>  static int
>  lagg_modevent(module_t mod, int type, void *data)
>  {
> @@ -1666,7 +1671,7 @@
>  	struct lagg_port *lp = NULL;
>  	uint32_t p = 0;
>  
> -	if (m->m_flags & M_FLOWID)
> +	if (lagg_use_flows && (m->m_flags & M_FLOWID))
>  		p = m->m_pkthdr.flowid;
>  	else
>  		p = lagg_hashmbuf(m, lb->lb_key);
> --- if_lagg.h.orig	2010-12-21 16:34:35.000000000 +0600
> +++ if_lagg.h	2010-12-21 16:35:27.000000000 +0600
> @@ -242,6 +242,8 @@
>  int		lagg_enqueue(struct ifnet *, struct mbuf *);
>  uint32_t	lagg_hashmbuf(struct mbuf *, uint32_t);
>  
> +extern int	lagg_use_flows;
> +
>  #endif /* _KERNEL */
>  
>  #endif /* _NET_LAGG_H */
> --- ieee8023ad_lacp.c.orig	2010-12-21 16:36:09.000000000 +0600
> +++ ieee8023ad_lacp.c	2010-12-21 16:35:58.000000000 +0600
> @@ -812,7 +812,7 @@
>  		return (NULL);
>  	}
>  
> -	if (m->m_flags & M_FLOWID)
> +	if (lagg_use_flows && (m->m_flags & M_FLOWID))
>  		hash = m->m_pkthdr.flowid;
>  	else
>  		hash = lagg_hashmbuf(m, lsc->lsc_hashkey);
> 
> Eugene Grosbein
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"

Can you please look at the different and maybe more generic solution for
the lagg and flowid problem? In if_ethersubr.c


in function ether_input I addeded this code

    m->m_flags |= M_FLOWID;
    m->m_pkthdr.flowid =  eh->ether_dhost[0] + eh->ether_dhost[1] +
eh->ether_dhost[2] + eh->ether_dhost[3] + eh->ether_dhost[4] +
eh->ether_dhost[5];
    m->m_pkthdr.flowid += eh->ether_shost[0] + eh->ether_shost[1] +
eh->ether_shost[2] + eh->ether_shost[3] + eh->ether_shost[4] +
eh->ether_shost[5];

and in function ether_demux I addeded this


    case ETHERTYPE_IP:
        if ((m = ip_fastforward(m)) == NULL)
            return;
        isr = NETISR_IP;

        struct ipheader {
            u_char offset [12]; //ip header fields not actually
needed      
            u_char src [4]; //ip src
            u_char dst [4]; //ip dst
        } __packed __aligned(4);

        if (m->m_pkthdr.len < sizeof(struct ipheader)) {  //ip header
and mbuf stuff stolen from ip_fastforward
            if_printf(ifp, "flowid math: discard frame with too small
header\n");
            goto discard;
        }
        if (m->m_len < sizeof (struct ipheader) &&
            (m = m_pullup(m, sizeof (struct ipheader))) == NULL) {
            if_printf(ifp, "flowid math: discard frame at pullup\n");
            return;>    /* mbuf already free'd */
        }
        struct ipheader *ip;
        ip = mtod(m, struct ipheader *);
        m->m_pkthdr.flowid += ip->src[0] + ip->src[1] + ip->src[2] +
ip->src[3];
        m->m_pkthdr.flowid += ip->dst[0] + ip->dst[1] + ip->dst[2] +
ip->dst[3];

//      if_printf(ifp, "Calculated flow id %d\n", m->m_pkthdr.flowid);

        break;

    case ETHERTYPE_ARP:


Sorry, I have no idea how to create nice diff, maybe point to small
howto will help :)

Probably this code should be wrapped by sysctl check, so it can be
enabled or disabled for l2 and l3 info, but I do not know how to do
this. In case we calculate flowid early at ethernet input, we not only
solve lagg load distribution but also different flows can be processed
with different netisr threads when fastforwarding disabled. I know about
wasting some cpu on this, but for example, for the router with 4 cores
and two em cards top looks like this:

last pid: 84129;  load averages:  0.06,  0.12,  0.09
up 9+14:21:30  17:30:50
175 processes: 6 running, 145 sleeping, 24 waiting
CPU 0:  2.3% user,  0.0% nice,  3.9% system, 27.9% interrupt, 65.9% idle
CPU 1:  1.6% user,  0.0% nice,  1.6% system, 26.4% interrupt, 70.5% idle
CPU 2:  3.1% user,  0.0% nice,  1.6% system, 34.1% interrupt, 61.2% idle
CPU 3:  0.8% user,  0.0% nice,  1.5% system, 30.0% interrupt, 67.7% idle
Mem: 175M Active, 38M Inact, 262M Wired, 108K Cache, 60M Buf, 1497M Free
Swap: 

# netstat -I em0 -w 1
            input          (em0)           output
   packets  errs idrops      bytes    packets  errs      bytes colls
     46381     0     0   49430586      36616     0   21480310     0
     45753     0     0   47685283      36941     0   22789381     0
     46167     0     0   48173940      37736     0   23442515     0
     46608     0     0   49172207      37705     0   23199023     0
     50114     0     0   53050475      39719     0   23409046     0
     47786     0     0   49826567      37621     0   23658505     0
^C
#

This box is all in one router - ppoe server with nat and dummynet
shaping, and with my changes it can nat, shape and forward up to 100
kpps in each direction with 30% idle on all cores.


Thanks a lot.
Yuriy.




More information about the freebsd-net mailing list