[Bug 209351] VLAN TX errors, possible performance regression after 10.1-STABLE (r281235)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Sat May 7 00:08:50 UTC 2016


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209351

            Bug ID: 209351
           Summary: VLAN TX errors, possible performance regression after
                    10.1-STABLE (r281235)
           Product: Base System
           Version: 11.0-CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: zclaudio at bsd.com.br
                CC: freebsd-amd64 at FreeBSD.org
                CC: freebsd-amd64 at FreeBSD.org

On a BGP, running FreeBSD 10.1-STABLE, version r281235 and it works fine for
several years now. After upgrading to any newer version I start having vlan TX
errors on the exact same hardware, just booting an SSD with a newer system.

Details:

We have around 4Gbit/s and 1.8Mpps routed on peak while per port interface we
peak at 300Kpps.

Our quality metrics are measured with:

ping -s 1472 -i 0.1 <our-other-ibgp-router>

As well as iperf bidirecional.

Systems working w/o problem:
- 10.1-STABLE / r281235

Systems tested with drops:
- 10.2-STABLE / r292035M
- 10.3-STABLE / r298705
- 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org)
- 11.0-CURRENT Melifaro Routing Branch / r297731M

While testing, when errors happen I can see output errs on the vlan port on the
output from "netstat -w1 -I vlan6"

           input          vlan6           output
   packets  errs idrops      bytes    packets  errs      bytes colls
         1     0     0         66      30557     2   33310968     0
         1     0     0        105      31458     3   33912219     0
         2     0     0       2954      32001     8   34983986     0
         1     0     0       1512      33150     6   35942558     0
         1     0     0       1512      33654     4   37311862     0
         1     0     0       1512      34825     3   38213793     0
         3     0     0       1683      35376     4   39488912     0
         5     0     0       7280      32423     3   35551869     0

Problems may happen under high load (~200Kpps) or low load (~30Kpps) on a vlan
port. 

The observed frame loss never happens on untagged ports, only vlan related.

The observed loss happens with packets sized 900 bytes and above but noticeably
loss rate is higher with packets close to 1400 (1472 is my reference size).

Loss rate on all listed systems different from r281235 is 9-19% with ping(1)
and iperf, while it's 0% (no loss or very irrelevant loss) on r281235.

Hardware tried:

- Intel 82599EB 10-Gigabit SFI/SFP+ Network Connection (2x2 on x8 PCIe bus,
total 4x10G).
- Chelsio T520, 2x2 on x8PCIe bus, total 4x10G

Exactly the same behavior, so it's not Intel related/exclusive.

Same hardware:

I always test the very same hardware, I have two SSD drives in this router, one
for the 10.1 which just runs fine and the other disk to test the various
versions of FreeBSD.

Sysctl/loader:

Only minor loader and sysctl confs are tweaked:

kern.hz=2000
net.inet.ip.redirect=1                # do not send IP redirects
net.inet.ip.accept_sourceroute=0      # drop source routed packets since they
ca
net.inet.ip.sourceroute=0             # if source routed packets are accepted
th
net.inet.tcp.drop_synfin=1            # SYN/FIN packets get dropped on initial
c
net.inet.udp.blackhole=1              # drop udp packets destined for closed
soc
net.inet.tcp.blackhole=2              # drop tcp packets destined for closed
por
security.bsd.see_other_uids=0

Netstat output when errors happen:

           input          vlan6           output
   packets  errs idrops      bytes    packets  errs      bytes colls
         1     0     0         66      30557     2   33310968     0
         1     0     0        105      31458     3   33912219     0
         2     0     0       2954      32001     8   34983986     0
         1     0     0       1512      33150     6   35942558     0
         1     0     0       1512      33654     4   37311862     0
         1     0     0       1512      34825     3   38213793     0
         3     0     0       1683      35376     4   39488912     0
         5     0     0       7280      32423     3   35551869     0

No relevant errors on the phisical ix(4) o cxl(4) ports happen.

It's very easy to simulate/reproduce in my environment, I just need to boot a
newer system and very soon some vlan start to drop packets which are not
dropped on 10.1-STABLE and I can be contacted if a developer want to ssh in. I
can also updated this PR with more informatio if needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list