em0 tx_dma_fail incrementing [SOLVED]

Adrian Chadd adrian at freebsd.org
Sun Nov 16 17:28:28 UTC 2014


Hi!

Good catch! Would you mind filing a bug so we remember and
(hopefully!) fix it to be the default?

https://bugs.freebsd.org/submit/

Thanks!


-adrian


On 15 November 2014 08:31, FF <fusionfoto at gmail.com> wrote:
> It looks like FreeBSD may be a victim of this bug:
>
>
>
> http://www.intel.com.au/content/dam/www/public/us/en/documents/specification-updates/82574-gbe-controller-spec-update.pdf
>
>
>
> 17. Tx Data Corruption When Using TCP Segmentation Offload
>
> Problem: When using TSO, a situation can occur where a PCIe MRd request is
> repeated with the
>
> same address, resulting in data corruption. At the end of the TCP packet,
> the Tx DMA
>
> hangs because the length doesn't match. This can only occur when the
> following are
>
> true:
>
> • The first buffer of the packet is larger than [3 * (max_read_request -
> 4)].
>
> • There is a 4 KB boundary within 64 bytes following the end of the header
> bytes in
>
> the buffer
>
> Implication: Possible data corruption since a TCP packet is transmitted
> containing the wrong data but
>
> with the correct checksum.
>
> Data transmission halts as the Tx DMA module enters a hang state.
>
> Workaround: The failure can be avoided by ensuring at least one of the
> following:
>
> • The buffer containing the headers should not be larger than [3 *
>
> (max_read_request - 4)]. To meet this requirement even for the minimum
> value of
>
> 128 bytes for max_read_request, the buffer should not be larger than 372
> bytes.
>
> • The alignment of the buffer containing the headers should be such that
> there is no
>
> 4 KB boundary within 64 bytes following the end of the header bytes.
> Assuming
>
> standard Ethernet/IP/TCP headers of 54 bytes, this means that the buffer
> should
>
> not start 54-118 bytes before a 4 KB boundary. For example, 128-byte
> alignment
>
> for this buffer could be used to fulfill this condition.
>
> This problem has not been reported when using an Intel Linux* or Windows*
> drivers.
>
> Current analysis shows it is very unlikely for a situation to exist that
> would cause the
>
> 82574 to be at risk for the errata when using the Intel Linux or Windows
> drivers.
>
>
>
> Linux and other distros seem to have fixed it. This could be getting
> exercised because FreeBSD recently changed the default buffer size above
> 256 for this driver.
>
>
> Since I didn't want to reboot to try the lower buffer size, I turned off
> TSO on all the machines that I'd checked that were actively incrementing
> tx_dma_fail for em interfaces then re-enabled their membership into the
> LACP.
>
>
> In brief testing, (few gigabits for a few minutes) tx_dma_fail has not
> incremented and throughput has not been negatively impacted (before vs
> after re-enable).
>
>
> This is so anyone else who is scratching their head about why em
> performance is terrible can solve it.
>
>
> Best,
>
>
> FF
>
>
> On Thu, Nov 13, 2014 at 1:52 PM, FF <fusionfoto at gmail.com> wrote:
>
>>
>> What knob do I need to turn to address this?
>>
>> This em0 is in an LACP bundle with an igb0 that isn't showing this problem.
>>
>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.8
>> dev.em.0.%driver: em
>> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GLAN
>> dev.em.0.%pnpinfo: vendor=0x8086 device=0x153b subvendor=0x15d9
>> subdevice=0x153b class=0x020000
>> dev.em.0.%parent: pci0
>> dev.em.0.nvm: -1
>> dev.em.0.debug: -1
>> dev.em.0.fc: 3
>> dev.em.0.rx_int_delay: 0
>> dev.em.0.tx_int_delay: 66
>> dev.em.0.rx_abs_int_delay: 66
>> dev.em.0.tx_abs_int_delay: 66
>> dev.em.0.itr: 488
>> dev.em.0.rx_processing_limit: 100
>> dev.em.0.eee_control: 1
>> dev.em.0.link_irq: 0
>> dev.em.0.mbuf_alloc_fail: 52
>> dev.em.0.cluster_alloc_fail: 0
>> dev.em.0.dropped: 0
>> **
>> dev.em.0.tx_dma_fail: 1834648
>> dev.em.0.rx_overruns: 3109
>> **
>> dev.em.0.watchdog_timeouts: 0
>> dev.em.0.device_control: 1209532992
>> dev.em.0.rx_control: 67141634
>> dev.em.0.fc_high_water: 23584
>> dev.em.0.fc_low_water: 20552
>> dev.em.0.queue0.txd_head: 577
>> dev.em.0.queue0.txd_tail: 577
>> dev.em.0.queue0.tx_irq: 0
>> dev.em.0.queue0.no_desc_avail: 0
>> dev.em.0.queue0.rxd_head: 967
>> dev.em.0.queue0.rxd_tail: 966
>> dev.em.0.queue0.rx_irq: 0
>> dev.em.0.mac_stats.excess_coll: 0
>> dev.em.0.mac_stats.single_coll: 0
>> dev.em.0.mac_stats.multiple_coll: 0
>> dev.em.0.mac_stats.late_coll: 0
>> dev.em.0.mac_stats.collision_count: 0
>> dev.em.0.mac_stats.symbol_errors: 0
>> dev.em.0.mac_stats.sequence_errors: 0
>> dev.em.0.mac_stats.defer_count: 0
>> dev.em.0.mac_stats.missed_packets: 61094
>> dev.em.0.mac_stats.recv_no_buff: 60008
>> dev.em.0.mac_stats.recv_undersize: 0
>> dev.em.0.mac_stats.recv_fragmented: 0
>> dev.em.0.mac_stats.recv_oversize: 0
>> dev.em.0.mac_stats.recv_jabber: 0
>> dev.em.0.mac_stats.recv_errs: 0
>> dev.em.0.mac_stats.crc_errs: 0
>> dev.em.0.mac_stats.alignment_errs: 0
>> dev.em.0.mac_stats.coll_ext_errs: 0
>> dev.em.0.mac_stats.xon_recvd: 40226659
>> dev.em.0.mac_stats.xon_txd: 2132
>> dev.em.0.mac_stats.xoff_recvd: 40241216
>> dev.em.0.mac_stats.xoff_txd: 2073563
>> dev.em.0.mac_stats.total_pkts_recvd: 3219537541
>> dev.em.0.mac_stats.good_pkts_recvd: 3139008594
>> dev.em.0.mac_stats.bcast_pkts_recvd: 3953817
>> dev.em.0.mac_stats.mcast_pkts_recvd: 607157
>> dev.em.0.mac_stats.rx_frames_64: 0
>> dev.em.0.mac_stats.rx_frames_65_127: 0
>> dev.em.0.mac_stats.rx_frames_128_255: 0
>> dev.em.0.mac_stats.rx_frames_256_511: 0
>> dev.em.0.mac_stats.rx_frames_512_1023: 0
>> dev.em.0.mac_stats.rx_frames_1024_1522: 0
>> dev.em.0.mac_stats.good_octets_recvd: 3527296369841
>> dev.em.0.mac_stats.good_octets_txd: 14348531993101
>> dev.em.0.mac_stats.total_pkts_txd: 10735190291
>> dev.em.0.mac_stats.good_pkts_txd: 10733114595
>> dev.em.0.mac_stats.bcast_pkts_txd: 14
>> dev.em.0.mac_stats.mcast_pkts_txd: 54334
>> dev.em.0.mac_stats.tx_frames_64: 0
>> dev.em.0.mac_stats.tx_frames_65_127: 0
>> dev.em.0.mac_stats.tx_frames_128_255: 0
>> dev.em.0.mac_stats.tx_frames_256_511: 0
>> dev.em.0.mac_stats.tx_frames_512_1023: 0
>> dev.em.0.mac_stats.tx_frames_1024_1522: 0
>> dev.em.0.mac_stats.tso_txd: 902605586
>> dev.em.0.mac_stats.tso_ctx_fail: 0
>> dev.em.0.interrupts.asserts: 1392541431
>> dev.em.0.interrupts.rx_pkt_timer: 0
>> dev.em.0.interrupts.rx_abs_timer: 0
>> dev.em.0.interrupts.tx_pkt_timer: 0
>> dev.em.0.interrupts.tx_abs_timer: 0
>> dev.em.0.interrupts.tx_queue_empty: 0
>> dev.em.0.interrupts.tx_queue_min_thresh: 0
>> dev.em.0.interrupts.rx_desc_min_thresh: 0
>> dev.em.0.interrupts.rx_overrun: 0
>> dev.em.0.wake: 0
>>
>> dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.10
>> dev.igb.0.%driver: igb
>> dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.RP04.PXSX
>> dev.igb.0.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x15d9
>> subdevice=0x1533 class=0x020000
>> dev.igb.0.%parent: pci5
>> dev.igb.0.nvm: -1
>> dev.igb.0.enable_aim: 1
>> dev.igb.0.fc: 3
>> dev.igb.0.rx_processing_limit: 100
>> dev.igb.0.dmac: 0
>> dev.igb.0.eee_disabled: 0
>> dev.igb.0.link_irq: 33
>> dev.igb.0.dropped: 0
>> dev.igb.0.tx_dma_fail: 0
>> dev.igb.0.rx_overruns: 0
>> dev.igb.0.watchdog_timeouts: 0
>> dev.igb.0.device_control: 1209795137
>> dev.igb.0.rx_control: 71335938
>> dev.igb.0.interrupt_mask: 4
>> dev.igb.0.extended_int_mask: 2147483679
>> dev.igb.0.tx_buf_alloc: 0
>> dev.igb.0.rx_buf_alloc: 0
>> dev.igb.0.fc_high_water: 31328
>> dev.igb.0.fc_low_water: 31312
>> dev.igb.0.queue0.no_desc_avail: 0
>> dev.igb.0.queue0.tx_packets: 62464141
>> dev.igb.0.queue0.rx_packets: 73012939
>> dev.igb.0.queue0.rx_bytes: 22529663814
>> dev.igb.0.queue0.lro_queued: 0
>> dev.igb.0.queue0.lro_flushed: 0
>> dev.igb.0.queue1.no_desc_avail: 0
>> dev.igb.0.queue1.tx_packets: 404298046
>> dev.igb.0.queue1.rx_packets: 307675818
>> dev.igb.0.queue1.rx_bytes: 185919902229
>> dev.igb.0.queue1.lro_queued: 0
>> dev.igb.0.queue1.lro_flushed: 0
>> dev.igb.0.queue2.no_desc_avail: 0
>> dev.igb.0.queue2.tx_packets: 3441053015
>> dev.igb.0.queue2.rx_packets: 5511826751
>> dev.igb.0.queue2.rx_bytes: 3054219311510
>> dev.igb.0.queue2.lro_queued: 0
>> dev.igb.0.queue2.lro_flushed: 0
>> dev.igb.0.queue3.no_desc_avail: 0
>> dev.igb.0.queue3.tx_packets: 1047838830
>> dev.igb.0.queue3.rx_packets: 1987495318
>> dev.igb.0.queue3.rx_bytes: 2696179247028
>> dev.igb.0.queue3.lro_queued: 0
>> dev.igb.0.queue3.lro_flushed: 0
>> dev.igb.0.mac_stats.excess_coll: 0
>> dev.igb.0.mac_stats.single_coll: 0
>> dev.igb.0.mac_stats.multiple_coll: 0
>> dev.igb.0.mac_stats.late_coll: 0
>> dev.igb.0.mac_stats.collision_count: 0
>> dev.igb.0.mac_stats.symbol_errors: 0
>> dev.igb.0.mac_stats.sequence_errors: 0
>> dev.igb.0.mac_stats.defer_count: 283811
>> dev.igb.0.mac_stats.missed_packets: 9449
>> dev.igb.0.mac_stats.recv_no_buff: 340
>> dev.igb.0.mac_stats.recv_undersize: 0
>> dev.igb.0.mac_stats.recv_fragmented: 0
>> dev.igb.0.mac_stats.recv_oversize: 0
>> dev.igb.0.mac_stats.recv_jabber: 0
>> dev.igb.0.mac_stats.recv_errs: 0
>> dev.igb.0.mac_stats.crc_errs: 0
>> dev.igb.0.mac_stats.alignment_errs: 0
>> dev.igb.0.mac_stats.coll_ext_errs: 0
>> dev.igb.0.mac_stats.xon_recvd: 46255557
>> dev.igb.0.mac_stats.xon_txd: 261
>> dev.igb.0.mac_stats.xoff_recvd: 46255994
>> dev.igb.0.mac_stats.xoff_txd: 7027
>> dev.igb.0.mac_stats.total_pkts_recvd: 7975033582
>> dev.igb.0.mac_stats.good_pkts_recvd: 7880001465
>> dev.igb.0.mac_stats.bcast_pkts_recvd: 5783868
>> dev.igb.0.mac_stats.mcast_pkts_recvd: 563315
>> dev.igb.0.mac_stats.rx_frames_64: 28412906
>> dev.igb.0.mac_stats.rx_frames_65_127: 3310187919
>> dev.igb.0.mac_stats.rx_frames_128_255: 784920450
>> dev.igb.0.mac_stats.rx_frames_256_511: 17225962
>> dev.igb.0.mac_stats.rx_frames_512_1023: 73415350
>> dev.igb.0.mac_stats.rx_frames_1024_1522: 3665838878
>> dev.igb.0.mac_stats.good_octets_recvd: 5990356613544
>> dev.igb.0.mac_stats.good_octets_txd: 46326753008181
>> dev.igb.0.mac_stats.total_pkts_txd: 33016014138
>> dev.igb.0.mac_stats.good_pkts_txd: 33016006850
>> dev.igb.0.mac_stats.bcast_pkts_txd: 834
>> dev.igb.0.mac_stats.mcast_pkts_txd: 54331
>> dev.igb.0.mac_stats.tx_frames_64: 30741691
>> dev.igb.0.mac_stats.tx_frames_65_127: 2174824217
>> dev.igb.0.mac_stats.tx_frames_128_255: 139804927
>> dev.igb.0.mac_stats.tx_frames_256_511: 59190261
>> dev.igb.0.mac_stats.tx_frames_512_1023: 386886648
>> dev.igb.0.mac_stats.tx_frames_1024_1522: 30224559106
>> dev.igb.0.mac_stats.tso_txd: 2384636909
>> dev.igb.0.mac_stats.tso_ctx_fail: 0
>> dev.igb.0.interrupts.asserts: 4556119857
>> dev.igb.0.interrupts.rx_pkt_timer: 7879778770
>> dev.igb.0.interrupts.rx_abs_timer: 0
>> dev.igb.0.interrupts.tx_pkt_timer: 0
>> dev.igb.0.interrupts.tx_abs_timer: 0
>> dev.igb.0.interrupts.tx_queue_empty: 33015268817
>> dev.igb.0.interrupts.tx_queue_min_thresh: 7880001470
>> dev.igb.0.interrupts.rx_desc_min_thresh: 0
>> dev.igb.0.interrupts.rx_overrun: 0
>> dev.igb.0.host.breaker_tx_pkt: 0
>> dev.igb.0.host.host_tx_pkt_discard: 0
>> dev.igb.0.host.rx_pkt: 222702
>> dev.igb.0.host.breaker_rx_pkts: 0
>> dev.igb.0.host.breaker_rx_pkt_drop: 0
>> dev.igb.0.host.tx_good_pkt: 738033
>> dev.igb.0.host.breaker_tx_pkt_drop: 0
>> dev.igb.0.host.rx_good_bytes: 5990357073320
>> dev.igb.0.host.tx_good_bytes: 46326753008181
>> dev.igb.0.host.length_errors: 0
>> dev.igb.0.host.serdes_violation_pkt: 0
>> dev.igb.0.host.header_redir_missed: 0
>> dev.igb.0.wake: 0
>>
>>
>> hw.em.eee_setting: 1
>> hw.em.rx_process_limit: 100
>> hw.em.enable_msix: 1
>> hw.em.sbp: 0
>> hw.em.smart_pwr_down: 0
>> hw.em.txd: 1024
>> hw.em.rxd: 1024
>> hw.em.rx_abs_int_delay: 66
>> hw.em.tx_abs_int_delay: 66
>> hw.em.rx_int_delay: 0
>> hw.em.tx_int_delay: 66
>>
>> hw.igb.rx_process_limit: 100
>> hw.igb.num_queues: 0
>> hw.igb.header_split: 0
>> hw.igb.buf_ring_size: 4096
>> hw.igb.max_interrupt_rate: 8000
>> hw.igb.enable_msix: 1
>> hw.igb.enable_aim: 1
>> hw.igb.txd: 1024
>> hw.igb.rxd: 1024
>>
>> FreeBSD systemname.com 9.2-RELEASE-p10 FreeBSD 9.2-RELEASE-p10 #0
>> r270148M: Mon Aug 18 23:14:36 EDT 2014     root at peta108:/usr/obj/usr/src/sys/CUSTOM10
>> amd64
>>
>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>
>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>         ether 00:25:90:f2:2d:24
>>         inet6 fe80::225:90ff:fef2:2d24%em0 prefixlen 64 scopeid 0x2
>>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>         media: Ethernet autoselect (1000baseT <full-duplex>)
>>         status: active
>> igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>
>> options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>         ether 00:25:90:f2:2d:24
>>         inet6 fe80::225:90ff:fef2:2d25%igb0 prefixlen 64 scopeid 0x4
>>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>         media: Ethernet autoselect (1000baseT <full-duplex>)
>>         status: active
>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>         options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>>         inet6 ::1 prefixlen 128
>>         inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
>>         inet 127.0.0.1 netmask 0xff000000
>>         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>
>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>         ether 00:25:90:f2:2d:24
>>         inet 192.168.0.108 netmask 0xffffff00 broadcast 192.168.0.255
>>         inet6 fe80::225:90ff:fef2:2d24%lagg0 prefixlen 64 scopeid 0x8
>>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>         media: Ethernet autoselect
>>         status: active
>>         laggproto lacp lagghash l2,l3,l4
>>         laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>         laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>
>> Thanks in advance!
>>
>> --
>> FF
>>
>
>
>
> --
> FF
> _______________________________________________
> freebsd-questions at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe at freebsd.org"


More information about the freebsd-questions mailing list