[Bug 195078] New: em tx_dma_fails and dropped packets
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Sun Nov 16 19:29:11 UTC 2014
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195078
Bug ID: 195078
Summary: em tx_dma_fails and dropped packets
Product: Base System
Version: 9.2-RELEASE
Hardware: Any
OS: Any
Status: Needs Triage
Severity: Affects Many People
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: fusionfoto at gmail.com
It looks like FreeBSD may be a victim of this bug. This likely affects all
FreeBSD versions that have defaulted to a higher dev.em.rxd, which could be
several.
I've turned tso on my running machine because I didn't want to reboot which
solved one set of problems, and then had to increase the rx_processing
threshold to hopefully solve the remaining packet drops.
I have another couple of machines scheduled to reboot with dev.em.rxd/txd set
to 256 which I think is the old value, and hopefully I'll be able to set the
rest of the sysctls back to normal.
Hope this helps.
---
http://www.intel.com.au/content/dam/www/public/us/en/documents/specification-updates/82574-gbe-controller-spec-update.pdf
17. Tx Data Corruption When Using TCP Segmentation Offload
Problem: When using TSO, a situation can occur where a PCIe MRd request is
repeated with the
same address, resulting in data corruption. At the end of the TCP packet, the
Tx DMA
hangs because the length doesn't match. This can only occur when the following
are
true:
• The first buffer of the packet is larger than [3 * (max_read_request - 4)].
• There is a 4 KB boundary within 64 bytes following the end of the header
bytes in
the buffer
Implication: Possible data corruption since a TCP packet is transmitted
containing the wrong data but
with the correct checksum.
Data transmission halts as the Tx DMA module enters a hang state.
Workaround: The failure can be avoided by ensuring at least one of the
following:
• The buffer containing the headers should not be larger than [3 *
(max_read_request - 4)]. To meet this requirement even for the minimum value of
128 bytes for max_read_request, the buffer should not be larger than 372 bytes.
• The alignment of the buffer containing the headers should be such that there
is no
4 KB boundary within 64 bytes following the end of the header bytes. Assuming
standard Ethernet/IP/TCP headers of 54 bytes, this means that the buffer should
not start 54-118 bytes before a 4 KB boundary. For example, 128-byte alignment
for this buffer could be used to fulfill this condition.
This problem has not been reported when using an Intel Linux* or Windows*
drivers.
Current analysis shows it is very unlikely for a situation to exist that would
cause the
82574 to be at risk for the errata when using the Intel Linux or Windows
drivers.
Linux and other distros seem to have fixed it. This could be getting exercised
because FreeBSD recently changed the default buffer size above 256 for this
driver.
**** my comments below ****
Since I didn't want to reboot to try the lower buffer size, I turned off TSO on
all the machines that I'd checked that were actively incrementing tx_dma_fail
for em interfaces then re-enabled their membership into the LACP.
In brief testing, (few gigabits for a few minutes) tx_dma_fail has not
incremented and throughput has not been negatively impacted (before vs after
re-enable).
On Thu, Nov 13, 2014 at 1:52 PM, FF <fusionfoto at gmail.com> wrote:
What knob do I need to turn to address this?
This em0 is in an LACP bundle with an igb0 that isn't showing this problem.
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.8
dev.em.0.%driver: em
dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GLAN
dev.em.0.%pnpinfo: vendor=0x8086 device=0x153b subvendor=0x15d9
subdevice=0x153b class=0x020000
dev.em.0.%parent: pci0
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.fc: 3
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.itr: 488
dev.em.0.rx_processing_limit: 100
dev.em.0.eee_control: 1
dev.em.0.link_irq: 0
dev.em.0.mbuf_alloc_fail: 52
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
**
dev.em.0.tx_dma_fail: 1834648
dev.em.0.rx_overruns: 3109
**
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1209532992
dev.em.0.rx_control: 67141634
dev.em.0.fc_high_water: 23584
dev.em.0.fc_low_water: 20552
dev.em.0.queue0.txd_head: 577
dev.em.0.queue0.txd_tail: 577
dev.em.0.queue0.tx_irq: 0
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 967
dev.em.0.queue0.rxd_tail: 966
dev.em.0.queue0.rx_irq: 0
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 0
dev.em.0.mac_stats.missed_packets: 61094
dev.em.0.mac_stats.recv_no_buff: 60008
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 40226659
dev.em.0.mac_stats.xon_txd: 2132
dev.em.0.mac_stats.xoff_recvd: 40241216
dev.em.0.mac_stats.xoff_txd: 2073563
dev.em.0.mac_stats.total_pkts_recvd: 3219537541
dev.em.0.mac_stats.good_pkts_recvd: 3139008594
dev.em.0.mac_stats.bcast_pkts_recvd: 3953817
dev.em.0.mac_stats.mcast_pkts_recvd: 607157
dev.em.0.mac_stats.rx_frames_64: 0
dev.em.0.mac_stats.rx_frames_65_127: 0
dev.em.0.mac_stats.rx_frames_128_255: 0
dev.em.0.mac_stats.rx_frames_256_511: 0
dev.em.0.mac_stats.rx_frames_512_1023: 0
dev.em.0.mac_stats.rx_frames_1024_1522: 0
dev.em.0.mac_stats.good_octets_recvd: 3527296369841
dev.em.0.mac_stats.good_octets_txd: 14348531993101
dev.em.0.mac_stats.total_pkts_txd: 10735190291
dev.em.0.mac_stats.good_pkts_txd: 10733114595
dev.em.0.mac_stats.bcast_pkts_txd: 14
dev.em.0.mac_stats.mcast_pkts_txd: 54334
dev.em.0.mac_stats.tx_frames_64: 0
dev.em.0.mac_stats.tx_frames_65_127: 0
dev.em.0.mac_stats.tx_frames_128_255: 0
dev.em.0.mac_stats.tx_frames_256_511: 0
dev.em.0.mac_stats.tx_frames_512_1023: 0
dev.em.0.mac_stats.tx_frames_1024_1522: 0
dev.em.0.mac_stats.tso_txd: 902605586
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.interrupts.asserts: 1392541431
dev.em.0.interrupts.rx_pkt_timer: 0
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 0
dev.em.0.interrupts.tx_abs_timer: 0
dev.em.0.interrupts.tx_queue_empty: 0
dev.em.0.interrupts.tx_queue_min_thresh: 0
dev.em.0.interrupts.rx_desc_min_thresh: 0
dev.em.0.interrupts.rx_overrun: 0
dev.em.0.wake: 0
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.10
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.RP04.PXSX
dev.igb.0.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x15d9
subdevice=0x1533 class=0x020000
dev.igb.0.%parent: pci5
dev.igb.0.nvm: -1
dev.igb.0.enable_aim: 1
dev.igb.0.fc: 3
dev.igb.0.rx_processing_limit: 100
dev.igb.0.dmac: 0
dev.igb.0.eee_disabled: 0
dev.igb.0.link_irq: 33
dev.igb.0.dropped: 0
dev.igb.0.tx_dma_fail: 0
dev.igb.0.rx_overruns: 0
dev.igb.0.watchdog_timeouts: 0
dev.igb.0.device_control: 1209795137
dev.igb.0.rx_control: 71335938
dev.igb.0.interrupt_mask: 4
dev.igb.0.extended_int_mask: 2147483679
dev.igb.0.tx_buf_alloc: 0
dev.igb.0.rx_buf_alloc: 0
dev.igb.0.fc_high_water: 31328
dev.igb.0.fc_low_water: 31312
dev.igb.0.queue0.no_desc_avail: 0
dev.igb.0.queue0.tx_packets: 62464141
dev.igb.0.queue0.rx_packets: 73012939
dev.igb.0.queue0.rx_bytes: 22529663814
dev.igb.0.queue0.lro_queued: 0
dev.igb.0.queue0.lro_flushed: 0
dev.igb.0.queue1.no_desc_avail: 0
dev.igb.0.queue1.tx_packets: 404298046
dev.igb.0.queue1.rx_packets: 307675818
dev.igb.0.queue1.rx_bytes: 185919902229
dev.igb.0.queue1.lro_queued: 0
dev.igb.0.queue1.lro_flushed: 0
dev.igb.0.queue2.no_desc_avail: 0
dev.igb.0.queue2.tx_packets: 3441053015
dev.igb.0.queue2.rx_packets: 5511826751
dev.igb.0.queue2.rx_bytes: 3054219311510
dev.igb.0.queue2.lro_queued: 0
dev.igb.0.queue2.lro_flushed: 0
dev.igb.0.queue3.no_desc_avail: 0
dev.igb.0.queue3.tx_packets: 1047838830
dev.igb.0.queue3.rx_packets: 1987495318
dev.igb.0.queue3.rx_bytes: 2696179247028
dev.igb.0.queue3.lro_queued: 0
dev.igb.0.queue3.lro_flushed: 0
dev.igb.0.mac_stats.excess_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.defer_count: 283811
dev.igb.0.mac_stats.missed_packets: 9449
dev.igb.0.mac_stats.recv_no_buff: 340
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.xon_recvd: 46255557
dev.igb.0.mac_stats.xon_txd: 261
dev.igb.0.mac_stats.xoff_recvd: 46255994
dev.igb.0.mac_stats.xoff_txd: 7027
dev.igb.0.mac_stats.total_pkts_recvd: 7975033582
dev.igb.0.mac_stats.good_pkts_recvd: 7880001465
dev.igb.0.mac_stats.bcast_pkts_recvd: 5783868
dev.igb.0.mac_stats.mcast_pkts_recvd: 563315
dev.igb.0.mac_stats.rx_frames_64: 28412906
dev.igb.0.mac_stats.rx_frames_65_127: 3310187919
dev.igb.0.mac_stats.rx_frames_128_255: 784920450
dev.igb.0.mac_stats.rx_frames_256_511: 17225962
dev.igb.0.mac_stats.rx_frames_512_1023: 73415350
dev.igb.0.mac_stats.rx_frames_1024_1522: 3665838878
dev.igb.0.mac_stats.good_octets_recvd: 5990356613544
dev.igb.0.mac_stats.good_octets_txd: 46326753008181
dev.igb.0.mac_stats.total_pkts_txd: 33016014138
dev.igb.0.mac_stats.good_pkts_txd: 33016006850
dev.igb.0.mac_stats.bcast_pkts_txd: 834
dev.igb.0.mac_stats.mcast_pkts_txd: 54331
dev.igb.0.mac_stats.tx_frames_64: 30741691
dev.igb.0.mac_stats.tx_frames_65_127: 2174824217
dev.igb.0.mac_stats.tx_frames_128_255: 139804927
dev.igb.0.mac_stats.tx_frames_256_511: 59190261
dev.igb.0.mac_stats.tx_frames_512_1023: 386886648
dev.igb.0.mac_stats.tx_frames_1024_1522: 30224559106
dev.igb.0.mac_stats.tso_txd: 2384636909
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.interrupts.asserts: 4556119857
dev.igb.0.interrupts.rx_pkt_timer: 7879778770
dev.igb.0.interrupts.rx_abs_timer: 0
dev.igb.0.interrupts.tx_pkt_timer: 0
dev.igb.0.interrupts.tx_abs_timer: 0
dev.igb.0.interrupts.tx_queue_empty: 33015268817
dev.igb.0.interrupts.tx_queue_min_thresh: 7880001470
dev.igb.0.interrupts.rx_desc_min_thresh: 0
dev.igb.0.interrupts.rx_overrun: 0
dev.igb.0.host.breaker_tx_pkt: 0
dev.igb.0.host.host_tx_pkt_discard: 0
dev.igb.0.host.rx_pkt: 222702
dev.igb.0.host.breaker_rx_pkts: 0
dev.igb.0.host.breaker_rx_pkt_drop: 0
dev.igb.0.host.tx_good_pkt: 738033
dev.igb.0.host.breaker_tx_pkt_drop: 0
dev.igb.0.host.rx_good_bytes: 5990357073320
dev.igb.0.host.tx_good_bytes: 46326753008181
dev.igb.0.host.length_errors: 0
dev.igb.0.host.serdes_violation_pkt: 0
dev.igb.0.host.header_redir_missed: 0
dev.igb.0.wake: 0
hw.em.eee_setting: 1
hw.em.rx_process_limit: 100
hw.em.enable_msix: 1
hw.em.sbp: 0
hw.em.smart_pwr_down: 0
hw.em.txd: 1024
hw.em.rxd: 1024
hw.em.rx_abs_int_delay: 66
hw.em.tx_abs_int_delay: 66
hw.em.rx_int_delay: 0
hw.em.tx_int_delay: 66
hw.igb.rx_process_limit: 100
hw.igb.num_queues: 0
hw.igb.header_split: 0
hw.igb.buf_ring_size: 4096
hw.igb.max_interrupt_rate: 8000
hw.igb.enable_msix: 1
hw.igb.enable_aim: 1
hw.igb.txd: 1024
hw.igb.rxd: 1024
FreeBSD systemname.com 9.2-RELEASE-p10 FreeBSD 9.2-RELEASE-p10 #0 r270148M:
Mon Aug 18 23:14:36 EDT 2014 root at peta108:/usr/obj/usr/src/sys/CUSTOM10
amd64
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
ether 00:25:90:f2:2d:24
inet6 fe80::225:90ff:fef2:2d24%em0 prefixlen 64 scopeid 0x2
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
ether 00:25:90:f2:2d:24
inet6 fe80::225:90ff:fef2:2d25%igb0 prefixlen 64 scopeid 0x4
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
ether 00:25:90:f2:2d:24
inet 192.168.0.108 netmask 0xffffff00 broadcast 192.168.0.255
inet6 fe80::225:90ff:fef2:2d24%lagg0 prefixlen 64 scopeid 0x8
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
Thanks in advance!
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list