packet loss on ixgbe using vlans and routing
John Hay
jhay at meraka.org.za
Fri Jul 23 07:40:52 UTC 2010
Hi,
(Jack any chance that you can look at this please?)
It looks like there are 2 problems with the ixgbe driver on FreeBSD-8.
I have a Dell T710 with 4 X 10G ethernet interfaces (2 X Dual port Intel
82599 cards). It is running FreeBSD RELENG_8.
1 - When routing (using vlans) there is heavy packet loss that go away
when you do "ifconfig ix2 -rxcsum". The packet loss seems to be on the
receive side because I do not see them on the receiving interface with
tcpdump. This seems to impact both ipv4 and ipv6.
My test setup is the Dell T710 with its ix2 connected to a 10G port of
a Nortel 4526GTX. On that port I have 2 vlans configured with half of
the 1G ports in the one vlan and the other half in the other vlan.
If I test with iperf from one of the machines on a 1G port to the T710,
I get 920Mbit/s. If I do it simultaneously from a few machines connected
to the 1G ports, all of them basically saturate their 1G links.
If I now try to route from the one vlan to the other, ie. doing an iperf
from a 1G connected machine, through the T710, to another 1G connected
machine, I see packet loss, sometimes iperf is only able to do 100kbits/s.
(Configuring a tcp relay, like socat, on the T710, and working through it,
I again get 900Mbit/s and more.)
So it seems that as long as the T710 with the 10G card is the start or
end point of the connection, I get no packet loss, but as soon as it
has to route, something go wrong.
2 - I see packet loss (0 - 40%) on IPv6 packets in vlans, when the
machine is not the originator of the packets. This happen even with
the "ifconfig ix2 -rxcsum".
Let me try to describe a little more. If a neigbouring machine ping6 it,
there will be packet loss. If it act as a router for ipv6, there will be
packet loss. This happen even when the network is pretty idle and with
different switches (Nortel and Cisco equipment). The packet loss is
very fluctuating. Pinging 1000 packets might loose 1% one time and the
next time 30%. Looking with tcpdump, I can see the packets arriving and
going out, but the packet never arrive at the next machine. (My feeling is
that they get lost inside the card.) The error counters on the switch
does not increment.
I do not see packet loss if the machine originate the packets, for example
ping6 from the machine. Also ipv4 packets do not have any packets loss. If
I do not use vlans, I don't see packet loss with ipv6 either.
The machine also have bce 1G interfaces and I do not see the packet loss
on them.
Here is some info about the machine / setup. The numbers are pretty low
because I rebooted after compiling a kernel with IPFIREWALL, ROUTETABLES,
MROUTING and FLOWTABLE removed. I'll add my kernel config file with empty
and commented out lines removed.
pciconf -lvc
ix0 at pci0:129:0:0: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
ix1 at pci0:129:0:1: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
ix2 at pci0:131:0:0: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
ix3 at pci0:131:0:1: class=0x020000 card=0x00038086 chip=0x10fb8086 rev=0x01 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit, vector masks
cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled
cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8)
output of vmstat -i
interrupt total rate
irq19: ehci0 28371 0
irq21: uhci2 uhci4+ 48 0
irq23: atapci0 46 0
irq34: mpt0 146954 2
cpu0: timer 112205297 1999
irq256: bce0 52063 0
irq257: bce1 1 0
irq258: bce2 1 0
irq259: bce3 1 0
irq260: ix0:que 0 142258 2
irq261: ix0:que 1 56464 1
irq262: ix0:que 2 56199 1
irq263: ix0:que 3 56198 1
irq264: ix0:que 4 66569 1
irq265: ix0:que 5 56148 1
irq266: ix0:que 6 56217 1
irq267: ix0:que 7 56311 1
irq268: ix0:que 8 56169 1
irq269: ix0:que 9 69485 1
irq270: ix0:que 10 56176 1
irq271: ix0:que 11 56205 1
irq272: ix0:que 12 56281 1
irq273: ix0:que 13 56359 1
irq274: ix0:que 14 56292 1
irq275: ix0:que 15 56197 1
irq276: ix0:link 2 0
irq277: ix1:que 0 107873 1
irq278: ix1:que 1 56094 0
irq279: ix1:que 2 56097 0
irq280: ix1:que 3 56096 0
irq281: ix1:que 4 65439 1
irq282: ix1:que 5 56091 0
irq283: ix1:que 6 56092 0
irq284: ix1:que 7 56098 0
irq285: ix1:que 8 56091 0
irq286: ix1:que 9 56096 0
irq287: ix1:que 10 56093 0
irq288: ix1:que 11 56091 0
irq289: ix1:que 12 56096 0
irq290: ix1:que 13 56095 0
irq291: ix1:que 14 57125 1
irq292: ix1:que 15 56093 0
irq293: ix1:link 1 0
irq294: ix2:que 0 231250 4
irq295: ix2:que 1 57784 1
irq296: ix2:que 2 69956 1
irq297: ix2:que 3 59498 1
irq298: ix2:que 4 58201 1
irq299: ix2:que 5 58599 1
irq300: ix2:que 6 57813 1
irq301: ix2:que 7 60075 1
irq302: ix2:que 8 68639 1
irq303: ix2:que 9 58194 1
irq304: ix2:que 10 60752 1
irq305: ix2:que 11 57628 1
irq306: ix2:que 12 66796 1
irq307: ix2:que 13 63307 1
irq308: ix2:que 14 60788 1
irq309: ix2:que 15 59102 1
irq310: ix2:link 5 0
irq311: ix3:que 0 56090 0
irq312: ix3:que 1 56090 0
irq313: ix3:que 2 56090 0
irq314: ix3:que 3 56090 0
irq315: ix3:que 4 56090 0
irq316: ix3:que 5 56090 0
irq317: ix3:que 6 56090 0
irq318: ix3:que 7 56090 0
irq319: ix3:que 8 56090 0
irq320: ix3:que 9 56090 0
irq321: ix3:que 10 56090 0
irq322: ix3:que 11 56090 0
irq323: ix3:que 12 56090 0
irq324: ix3:que 13 56090 0
irq325: ix3:que 14 56090 0
irq326: ix3:que 15 56090 0
cpu1: timer 112196134 1999
cpu10: timer 112196179 1999
cpu3: timer 112196135 1999
cpu8: timer 112196108 1999
cpu4: timer 112196161 1999
cpu11: timer 112196179 1999
cpu5: timer 112196161 1999
cpu13: timer 112196179 1999
cpu6: timer 112196161 1999
cpu14: timer 112196179 1999
cpu2: timer 112196106 1999
cpu12: timer 112196179 1999
cpu7: timer 112196161 1999
cpu9: timer 112196155 1999
cpu15: timer 112196179 1999
Total 1799390156 32072
netstat -m
133178/4042/137220 mbufs in use (current/cache/total)
133112/2062/135174/262144 mbuf clusters in use (current/cache/total/max)
133112/2056 mbuf+clusters out of packet secondary zone in use (current/cache)
0/20/20/131072 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/65536 9k jumbo clusters in use (current/cache/total/max)
0/0/0/32768 16k jumbo clusters in use (current/cache/total/max)
299518K/5214K/304733K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
kernel config file, basically started with 64 bit and removed the stuff
I do not need.
cpu HAMMER
ident SEEKAT
device ipmi
makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols
options SCHED_ULE # ULE scheduler
options PREEMPTION # Enable kernel thread preemption
options INET # InterNETworking
options INET6 # IPv6 communications protocols
options SCTP # Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_DIRHASH # Improve performance on big directories
options CD9660 # ISO 9660 Filesystem
options PROCFS # Process filesystem (requires PSEUDOFS)
options PSEUDOFS # Pseudo-filesystem framework
options GEOM_PART_GPT # GUID Partition Tables.
options GEOM_LABEL # Provides labelization
options COMPAT_43TTY # BSD 4.3 TTY compat (sgtty)
options COMPAT_IA32 # Compatible with i386 binaries
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options COMPAT_FREEBSD5 # Compatible with FreeBSD5
options COMPAT_FREEBSD6 # Compatible with FreeBSD6
options COMPAT_FREEBSD7 # Compatible with FreeBSD7
options KTRACE # ktrace(1) support
options STACK # stack(9) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options P1003_1B_SEMAPHORES # POSIX-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options PRINTF_BUFR_SIZE=128 # Prevent printf output being interspersed.
options KBD_INSTALL_CDEV # install a CDEV entry in /dev
options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4)
options INCLUDE_CONFIG_FILE # Include this file in kernel
options SMP # Symmetric MultiProcessor Kernel
device cpufreq
device acpi
device pci
device ata
device atapicd # ATAPI CDROM drives
device mpt # LSI-Logic MPT-Fusion
device scbus # SCSI bus (required for SCSI)
device da # Direct Access (disks)
device pass # Passthrough device (direct SCSI access)
device atkbdc # AT keyboard controller
device atkbd # AT keyboard
device psm # PS/2 mouse
device kbdmux # keyboard multiplexer
device vga # VGA video card driver
device splash # Splash screen and screen saver support
device sc
device agp # support several AGP chipsets
device uart # Generic UART driver
device loop # Network loopback
device random # Entropy device
device ether # Ethernet support
device pty # BSD-style compatibility pseudo ttys
device bpf # Berkeley packet filter
device uhci # UHCI PCI->USB interface
device ehci # EHCI PCI->USB interface (USB 2.0)
device usb # USB Bus (required)
device uhid # "Human Interface Devices"
device ukbd # Keyboard
device umass # Disks/Mass storage - Requires scbus and da
device ums # Mouse
kldstat
Id Refs Address Size Name
1 55 0xffffffff80100000 6ea290 kernel
2 1 0xffffffff807eb000 19e088 zfs.ko
3 2 0xffffffff8098a000 3860 opensolaris.ko
4 2 0xffffffff8098e000 20448 krpc.ko
5 1 0xffffffff809af000 21100 geom_mirror.ko
6 1 0xffffffff809d1000 66c0 if_vlan.ko
7 1 0xffffffff809d8000 506c8 if_bce.ko
8 2 0xffffffff80a29000 3ec20 miibus.ko
9 1 0xffffffff80a68000 243e0 if_ixgbe.ko
10 1 0xffffffff80a8d000 1e08 coretemp.ko
ifconfig ix2 (with -rxcsum and global addrs modified)
ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=5b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO>
ether 00:1b:21:57:ef:7c
inet6 fe80::21b:21ff:fe57:ef7c%ix2 prefixlen 64 scopeid 0x3
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
ifconfig ix2.1
ix2.1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:1b:21:57:ef:7c
inet 10.0.28.2 netmask 0xffffff00 broadcast 10.0.28.255
inet6 fe80::21b:21ff:fe57:b420%ix2.1 prefixlen 64 scopeid 0x9
inet6 2001:0:0:3:21b:21ff:fe57:b420 prefixlen 64
inet6 2001:0:0:3:: prefixlen 64 anycast
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
vlan: 1 parent interface: ix2
ifconfig ix2.8
ix2.8: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 00:1b:21:57:ef:7c
inet 10.0.8.50 netmask 0xffffff00 broadcast 10.0.8.255
inet6 fe80::21b:21ff:fe57:b420%ix2.8 prefixlen 64 scopeid 0xa
inet6 2001:0:0:1:21b:21ff:fe57:b420 prefixlen 64
inet6 2001:0:0:1:: prefixlen 64 anycast
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
media: Ethernet autoselect (10Gbase-SR <full-duplex>)
status: active
vlan: 8 parent interface: ix2
John
--
John Hay -- jhay at meraka.csir.co.za / jhay at FreeBSD.org
More information about the freebsd-net
mailing list