Regression? VLAN packet drop after upgrading from r281235

Ze Claudio Pastore zclaudio at bsd.com.br
Sat May 7 00:11:30 UTC 2016


OK I submitted a Bug Report, if someone else get's a similar problem.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209351


2016-04-27 18:10 GMT-03:00 Zé Claudio Pastore <zclaudio at bsd.com.br>:

> Hello Ryan,
>
> 2016-04-27 17:28 GMT-03:00 Ryan Stone <rysto32 at gmail.com>:
>
>> From a quick look at the vlan code, I can identify a few cases that might
>> cause that counter to increment:
>>
>> 1) Error from the underlying ixgbe device.  Does "netstat -dI ix0" show
>> that the driver has been dropping packets?
>>
>
> No, it does not increase drop counters on ix port, only on the vlan device.
>
>
>>
>> 2) Link down events on the underlying NIC.  I believe that link flaps
>> will be logged to /var/log/messages and dmesg; do you see anything there
>> that might correspond to the time of the packet drops?
>>
>
> No, dmesg is clean, only a couple down/up link when I actually did
> disconnect the port, and no other message on /var/log/messages that grabs
> my attention.
>
>
>>
>> 3) If VLAN_HWTAGGING is disabled through ifconfig on the port, then in
>> theory a low memory event could cause the packet to be dropped.  Does
>> "netstat -m" show that "requests for mbufs denied" increasing?
>>
>
> Here is the ifconfig -v output for the vlan6 on the 10.1-STABLE system
>
> vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=303<RXCSUM,TXCSUM,TSO4,TSO6>
> ether a0:36:9f:2a:6d:ae
> inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19
> inet6 2804:1054:bad:b1fe::1 prefixlen 64
> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> media: Ethernet autoselect (10Gbase-SR <full-duplex>)
> status: active
> vlan: 3005 parent interface: ix3
> groups: vlan
>
> And here it is on the 10.3-STABLE system, I dont know why the only
> difference is no options were printed on the newer system, everything else
> is the same.
>
> vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> ether a0:36:9f:2a:6d:ae
> inet6 fe80::a236:9fff:fe2a:6dae%vlan6 prefixlen 64 scopeid 0x19
> inet6 2804:1054:bad:b1fe::1 prefixlen 64
> nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> media: Ethernet autoselect (10Gbase-SR <full-duplex>)
> status: active
> vlan: 3005 parent interface: ix3
> groups: vlan
>
> This is the netstat -m output when system has packet loss. Denied and
> delayed counters are zeroed.
>
>  % netstat -m
> 12365/21040/33405 mbufs in use (current/cache/total)
> 12310/14530/26840/505076 mbuf clusters in use (current/cache/total/max)
> 12310/14508 mbuf+clusters out of packet secondary zone in use
> (current/cache)
> 0/225/225/252538 4k (page size) jumbo clusters in use
> (current/cache/total/max)
> 0/0/0/74826 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/42089 16k jumbo clusters in use (current/cache/total/max)
> 27711K/35220K/62931K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
>
>
>
>>
>> On Wed, Apr 27, 2016 at 2:41 PM, Zé Claudio Pastore <zclaudio at bsd.com.br>
>> wrote:
>>
>>> Hello,
>>>
>>> On a BGP border router I help manage, we run FreeBSD 10.1-STABLE,
>>> version r281235 and it works fine for several years now.
>>>
>>> We have around 4Gbit/s and 1.8Mpps routed on peak while per port
>>> interface
>>> we peak at 300Kpps.
>>>
>>> Our quality metrics are measured with:
>>>
>>> ping -s 1472 -i 0.1 <our-other-ibgp-router>
>>>
>>> As well as iperf bidirecional.
>>>
>>> This metric is similar to what Speedy Test and SIMET tests are done and
>>> our
>>> customers reference.
>>>
>>> Systems working w/o problem:
>>> - 10.1-STABLE / r281235
>>>
>>> Systems tested with drops:
>>> - 10.2-STABLE / r292035M
>>> - 10.3-STABLE / r298705
>>> - 11.0-CURRENT / r295683 (downloaded snapshot from ftp.freebsd.org)
>>> - 11.0-CURRENT Melifaro Routing Branch / r297731M
>>>
>>> While testing, when errors happen I can see output errs on the vlan port
>>> on
>>> the output from "netstat -w1 -I vlan6"
>>>
>>>            input          vlan6           output
>>>    packets  errs idrops      bytes    packets  errs      bytes colls
>>>          1     0     0         66      30557     2   33310968     0
>>>          1     0     0        105      31458     3   33912219     0
>>>          2     0     0       2954      32001     8   34983986     0
>>>          1     0     0       1512      33150     6   35942558     0
>>>          1     0     0       1512      33654     4   37311862     0
>>>          1     0     0       1512      34825     3   38213793     0
>>>          3     0     0       1683      35376     4   39488912     0
>>>          5     0     0       7280      32423     3   35551869     0
>>>
>>> Problems may happen under high load (~200Kpps) or low load (~30Kpps) on a
>>> vlan port. The observed frame loss never happens on untagged ports, only
>>> vlan related. The observed loss happens with packets sized 900 bytes and
>>> above but noticeably loss rate is higher with packets close to 1400 (1472
>>> is my reference size).
>>>
>>> Loss rate on all listed systems different from r281235 is 9-19% with
>>> ping(1) and iperf, while it's 0% on r281235.
>>>
>>> First I believed it to be a Intel driver error on systems newer than
>>> 10.1.
>>> My reference card are dual port 82599EB 10-Gigabit SFI/SFP+ Network
>>> Connection (2x2 on x8 PCIe bus, total 4x10G). But yesterday I replaced
>>> Intel by Chelsio T5 and the problem is still exactly the same, so it's
>>> not
>>> related to card vendor.
>>>
>>> I always test the very same hardware, I have two SSD drives in this
>>> router,
>>> one for the 10.1 which just runs fine and the other disk to test the
>>> various versions of FreeBSD.
>>>
>>> Only minor loader and sysctl confs are tweaked:
>>>
>>> kern.hz=2000
>>> net.inet.ip.redirect=1                # do not send IP redirects
>>> net.inet.ip.accept_sourceroute=0      # drop source routed packets since
>>> they ca
>>> net.inet.ip.sourceroute=0             # if source routed packets are
>>> accepted th
>>> net.inet.tcp.drop_synfin=1            # SYN/FIN packets get dropped on
>>> initial c
>>> net.inet.udp.blackhole=1              # drop udp packets destined for
>>> closed soc
>>> net.inet.tcp.blackhole=2              # drop tcp packets destined for
>>> closed por
>>> security.bsd.see_other_uids=0
>>>
>>> Can anyone suggest what might be a fix/tuning for this behavior? Was
>>> there
>>> any relevant change on vlan code from particular revisions close to the
>>> one
>>> I run on 10.1 and later which would lead to such a big difference?
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>>
>>
>>
>


More information about the freebsd-net mailing list