em0 tx_dma_fail incrementing [SOLVED]

FF fusionfoto at gmail.com
Wed Nov 19 16:04:55 UTC 2014


As a follow-up, this fix does solve our performance problem, but we are
seeing ~4 CRC/RECV errors incrementing per day (which, as the output prior
shows wasn't the case). We've disabled rxcsum and txcsum on the card to see
if that helps address that.

Another server we've rebooted with the hw.em.txd=256 and hw.em.rxd=256 and
it hasn't exhibited any issue yet (2-3 days)... but it hasn't been stress
tested either.

On Sun, Nov 16, 2014 at 6:58 PM, Mike Tancsa <mike at sentex.net> wrote:

> On 11/16/2014 12:28 PM, Adrian Chadd wrote:
>
>> Hi!
>>
>> Good catch! Would you mind filing a bug so we remember and
>> (hopefully!) fix it to be the default?
>>
>> https://bugs.freebsd.org/submit/
>>
>
> I wonder if this is the bug I was running into
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193802
>
>         ---Mike
>
>
>
>> Thanks!
>>
>>
>> -adrian
>>
>>
>> On 15 November 2014 08:31, FF <fusionfoto at gmail.com> wrote:
>>
>>> It looks like FreeBSD may be a victim of this bug:
>>>
>>>
>>>
>>> http://www.intel.com.au/content/dam/www/public/us/en/
>>> documents/specification-updates/82574-gbe-controller-spec-update.pdf
>>>
>>>
>>>
>>> 17. Tx Data Corruption When Using TCP Segmentation Offload
>>>
>>> Problem: When using TSO, a situation can occur where a PCIe MRd request
>>> is
>>> repeated with the
>>>
>>> same address, resulting in data corruption. At the end of the TCP packet,
>>> the Tx DMA
>>>
>>> hangs because the length doesn't match. This can only occur when the
>>> following are
>>>
>>> true:
>>>
>>> • The first buffer of the packet is larger than [3 * (max_read_request -
>>> 4)].
>>>
>>> • There is a 4 KB boundary within 64 bytes following the end of the
>>> header
>>> bytes in
>>>
>>> the buffer
>>>
>>> Implication: Possible data corruption since a TCP packet is transmitted
>>> containing the wrong data but
>>>
>>> with the correct checksum.
>>>
>>> Data transmission halts as the Tx DMA module enters a hang state.
>>>
>>> Workaround: The failure can be avoided by ensuring at least one of the
>>> following:
>>>
>>> • The buffer containing the headers should not be larger than [3 *
>>>
>>> (max_read_request - 4)]. To meet this requirement even for the minimum
>>> value of
>>>
>>> 128 bytes for max_read_request, the buffer should not be larger than 372
>>> bytes.
>>>
>>> • The alignment of the buffer containing the headers should be such that
>>> there is no
>>>
>>> 4 KB boundary within 64 bytes following the end of the header bytes.
>>> Assuming
>>>
>>> standard Ethernet/IP/TCP headers of 54 bytes, this means that the buffer
>>> should
>>>
>>> not start 54-118 bytes before a 4 KB boundary. For example, 128-byte
>>> alignment
>>>
>>> for this buffer could be used to fulfill this condition.
>>>
>>> This problem has not been reported when using an Intel Linux* or Windows*
>>> drivers.
>>>
>>> Current analysis shows it is very unlikely for a situation to exist that
>>> would cause the
>>>
>>> 82574 to be at risk for the errata when using the Intel Linux or Windows
>>> drivers.
>>>
>>>
>>>
>>> Linux and other distros seem to have fixed it. This could be getting
>>> exercised because FreeBSD recently changed the default buffer size above
>>> 256 for this driver.
>>>
>>>
>>> Since I didn't want to reboot to try the lower buffer size, I turned off
>>> TSO on all the machines that I'd checked that were actively incrementing
>>> tx_dma_fail for em interfaces then re-enabled their membership into the
>>> LACP.
>>>
>>>
>>> In brief testing, (few gigabits for a few minutes) tx_dma_fail has not
>>> incremented and throughput has not been negatively impacted (before vs
>>> after re-enable).
>>>
>>>
>>> This is so anyone else who is scratching their head about why em
>>> performance is terrible can solve it.
>>>
>>>
>>> Best,
>>>
>>>
>>> FF
>>>
>>>
>>> On Thu, Nov 13, 2014 at 1:52 PM, FF <fusionfoto at gmail.com> wrote:
>>>
>>>
>>>> What knob do I need to turn to address this?
>>>>
>>>> This em0 is in an LACP bundle with an igb0 that isn't showing this
>>>> problem.
>>>>
>>>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.8
>>>> dev.em.0.%driver: em
>>>> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GLAN
>>>> dev.em.0.%pnpinfo: vendor=0x8086 device=0x153b subvendor=0x15d9
>>>> subdevice=0x153b class=0x020000
>>>> dev.em.0.%parent: pci0
>>>> dev.em.0.nvm: -1
>>>> dev.em.0.debug: -1
>>>> dev.em.0.fc: 3
>>>> dev.em.0.rx_int_delay: 0
>>>> dev.em.0.tx_int_delay: 66
>>>> dev.em.0.rx_abs_int_delay: 66
>>>> dev.em.0.tx_abs_int_delay: 66
>>>> dev.em.0.itr: 488
>>>> dev.em.0.rx_processing_limit: 100
>>>> dev.em.0.eee_control: 1
>>>> dev.em.0.link_irq: 0
>>>> dev.em.0.mbuf_alloc_fail: 52
>>>> dev.em.0.cluster_alloc_fail: 0
>>>> dev.em.0.dropped: 0
>>>> **
>>>> dev.em.0.tx_dma_fail: 1834648
>>>> dev.em.0.rx_overruns: 3109
>>>> **
>>>> dev.em.0.watchdog_timeouts: 0
>>>> dev.em.0.device_control: 1209532992
>>>> dev.em.0.rx_control: 67141634
>>>> dev.em.0.fc_high_water: 23584
>>>> dev.em.0.fc_low_water: 20552
>>>> dev.em.0.queue0.txd_head: 577
>>>> dev.em.0.queue0.txd_tail: 577
>>>> dev.em.0.queue0.tx_irq: 0
>>>> dev.em.0.queue0.no_desc_avail: 0
>>>> dev.em.0.queue0.rxd_head: 967
>>>> dev.em.0.queue0.rxd_tail: 966
>>>> dev.em.0.queue0.rx_irq: 0
>>>> dev.em.0.mac_stats.excess_coll: 0
>>>> dev.em.0.mac_stats.single_coll: 0
>>>> dev.em.0.mac_stats.multiple_coll: 0
>>>> dev.em.0.mac_stats.late_coll: 0
>>>> dev.em.0.mac_stats.collision_count: 0
>>>> dev.em.0.mac_stats.symbol_errors: 0
>>>> dev.em.0.mac_stats.sequence_errors: 0
>>>> dev.em.0.mac_stats.defer_count: 0
>>>> dev.em.0.mac_stats.missed_packets: 61094
>>>> dev.em.0.mac_stats.recv_no_buff: 60008
>>>> dev.em.0.mac_stats.recv_undersize: 0
>>>> dev.em.0.mac_stats.recv_fragmented: 0
>>>> dev.em.0.mac_stats.recv_oversize: 0
>>>> dev.em.0.mac_stats.recv_jabber: 0
>>>> dev.em.0.mac_stats.recv_errs: 0
>>>> dev.em.0.mac_stats.crc_errs: 0
>>>> dev.em.0.mac_stats.alignment_errs: 0
>>>> dev.em.0.mac_stats.coll_ext_errs: 0
>>>> dev.em.0.mac_stats.xon_recvd: 40226659
>>>> dev.em.0.mac_stats.xon_txd: 2132
>>>> dev.em.0.mac_stats.xoff_recvd: 40241216
>>>> dev.em.0.mac_stats.xoff_txd: 2073563
>>>> dev.em.0.mac_stats.total_pkts_recvd: 3219537541
>>>> dev.em.0.mac_stats.good_pkts_recvd: 3139008594
>>>> dev.em.0.mac_stats.bcast_pkts_recvd: 3953817
>>>> dev.em.0.mac_stats.mcast_pkts_recvd: 607157
>>>> dev.em.0.mac_stats.rx_frames_64: 0
>>>> dev.em.0.mac_stats.rx_frames_65_127: 0
>>>> dev.em.0.mac_stats.rx_frames_128_255: 0
>>>> dev.em.0.mac_stats.rx_frames_256_511: 0
>>>> dev.em.0.mac_stats.rx_frames_512_1023: 0
>>>> dev.em.0.mac_stats.rx_frames_1024_1522: 0
>>>> dev.em.0.mac_stats.good_octets_recvd: 3527296369841
>>>> dev.em.0.mac_stats.good_octets_txd: 14348531993101
>>>> dev.em.0.mac_stats.total_pkts_txd: 10735190291
>>>> dev.em.0.mac_stats.good_pkts_txd: 10733114595
>>>> dev.em.0.mac_stats.bcast_pkts_txd: 14
>>>> dev.em.0.mac_stats.mcast_pkts_txd: 54334
>>>> dev.em.0.mac_stats.tx_frames_64: 0
>>>> dev.em.0.mac_stats.tx_frames_65_127: 0
>>>> dev.em.0.mac_stats.tx_frames_128_255: 0
>>>> dev.em.0.mac_stats.tx_frames_256_511: 0
>>>> dev.em.0.mac_stats.tx_frames_512_1023: 0
>>>> dev.em.0.mac_stats.tx_frames_1024_1522: 0
>>>> dev.em.0.mac_stats.tso_txd: 902605586
>>>> dev.em.0.mac_stats.tso_ctx_fail: 0
>>>> dev.em.0.interrupts.asserts: 1392541431
>>>> dev.em.0.interrupts.rx_pkt_timer: 0
>>>> dev.em.0.interrupts.rx_abs_timer: 0
>>>> dev.em.0.interrupts.tx_pkt_timer: 0
>>>> dev.em.0.interrupts.tx_abs_timer: 0
>>>> dev.em.0.interrupts.tx_queue_empty: 0
>>>> dev.em.0.interrupts.tx_queue_min_thresh: 0
>>>> dev.em.0.interrupts.rx_desc_min_thresh: 0
>>>> dev.em.0.interrupts.rx_overrun: 0
>>>> dev.em.0.wake: 0
>>>>
>>>> dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.3.10
>>>> dev.igb.0.%driver: igb
>>>> dev.igb.0.%location: slot=0 function=0 handle=\_SB_.PCI0.RP04.PXSX
>>>> dev.igb.0.%pnpinfo: vendor=0x8086 device=0x1533 subvendor=0x15d9
>>>> subdevice=0x1533 class=0x020000
>>>> dev.igb.0.%parent: pci5
>>>> dev.igb.0.nvm: -1
>>>> dev.igb.0.enable_aim: 1
>>>> dev.igb.0.fc: 3
>>>> dev.igb.0.rx_processing_limit: 100
>>>> dev.igb.0.dmac: 0
>>>> dev.igb.0.eee_disabled: 0
>>>> dev.igb.0.link_irq: 33
>>>> dev.igb.0.dropped: 0
>>>> dev.igb.0.tx_dma_fail: 0
>>>> dev.igb.0.rx_overruns: 0
>>>> dev.igb.0.watchdog_timeouts: 0
>>>> dev.igb.0.device_control: 1209795137
>>>> dev.igb.0.rx_control: 71335938
>>>> dev.igb.0.interrupt_mask: 4
>>>> dev.igb.0.extended_int_mask: 2147483679
>>>> dev.igb.0.tx_buf_alloc: 0
>>>> dev.igb.0.rx_buf_alloc: 0
>>>> dev.igb.0.fc_high_water: 31328
>>>> dev.igb.0.fc_low_water: 31312
>>>> dev.igb.0.queue0.no_desc_avail: 0
>>>> dev.igb.0.queue0.tx_packets: 62464141
>>>> dev.igb.0.queue0.rx_packets: 73012939
>>>> dev.igb.0.queue0.rx_bytes: 22529663814
>>>> dev.igb.0.queue0.lro_queued: 0
>>>> dev.igb.0.queue0.lro_flushed: 0
>>>> dev.igb.0.queue1.no_desc_avail: 0
>>>> dev.igb.0.queue1.tx_packets: 404298046
>>>> dev.igb.0.queue1.rx_packets: 307675818
>>>> dev.igb.0.queue1.rx_bytes: 185919902229
>>>> dev.igb.0.queue1.lro_queued: 0
>>>> dev.igb.0.queue1.lro_flushed: 0
>>>> dev.igb.0.queue2.no_desc_avail: 0
>>>> dev.igb.0.queue2.tx_packets: 3441053015
>>>> dev.igb.0.queue2.rx_packets: 5511826751
>>>> dev.igb.0.queue2.rx_bytes: 3054219311510
>>>> dev.igb.0.queue2.lro_queued: 0
>>>> dev.igb.0.queue2.lro_flushed: 0
>>>> dev.igb.0.queue3.no_desc_avail: 0
>>>> dev.igb.0.queue3.tx_packets: 1047838830
>>>> dev.igb.0.queue3.rx_packets: 1987495318
>>>> dev.igb.0.queue3.rx_bytes: 2696179247028
>>>> dev.igb.0.queue3.lro_queued: 0
>>>> dev.igb.0.queue3.lro_flushed: 0
>>>> dev.igb.0.mac_stats.excess_coll: 0
>>>> dev.igb.0.mac_stats.single_coll: 0
>>>> dev.igb.0.mac_stats.multiple_coll: 0
>>>> dev.igb.0.mac_stats.late_coll: 0
>>>> dev.igb.0.mac_stats.collision_count: 0
>>>> dev.igb.0.mac_stats.symbol_errors: 0
>>>> dev.igb.0.mac_stats.sequence_errors: 0
>>>> dev.igb.0.mac_stats.defer_count: 283811
>>>> dev.igb.0.mac_stats.missed_packets: 9449
>>>> dev.igb.0.mac_stats.recv_no_buff: 340
>>>> dev.igb.0.mac_stats.recv_undersize: 0
>>>> dev.igb.0.mac_stats.recv_fragmented: 0
>>>> dev.igb.0.mac_stats.recv_oversize: 0
>>>> dev.igb.0.mac_stats.recv_jabber: 0
>>>> dev.igb.0.mac_stats.recv_errs: 0
>>>> dev.igb.0.mac_stats.crc_errs: 0
>>>> dev.igb.0.mac_stats.alignment_errs: 0
>>>> dev.igb.0.mac_stats.coll_ext_errs: 0
>>>> dev.igb.0.mac_stats.xon_recvd: 46255557
>>>> dev.igb.0.mac_stats.xon_txd: 261
>>>> dev.igb.0.mac_stats.xoff_recvd: 46255994
>>>> dev.igb.0.mac_stats.xoff_txd: 7027
>>>> dev.igb.0.mac_stats.total_pkts_recvd: 7975033582
>>>> dev.igb.0.mac_stats.good_pkts_recvd: 7880001465
>>>> dev.igb.0.mac_stats.bcast_pkts_recvd: 5783868
>>>> dev.igb.0.mac_stats.mcast_pkts_recvd: 563315
>>>> dev.igb.0.mac_stats.rx_frames_64: 28412906
>>>> dev.igb.0.mac_stats.rx_frames_65_127: 3310187919
>>>> dev.igb.0.mac_stats.rx_frames_128_255: 784920450
>>>> dev.igb.0.mac_stats.rx_frames_256_511: 17225962
>>>> dev.igb.0.mac_stats.rx_frames_512_1023: 73415350
>>>> dev.igb.0.mac_stats.rx_frames_1024_1522: 3665838878
>>>> dev.igb.0.mac_stats.good_octets_recvd: 5990356613544
>>>> dev.igb.0.mac_stats.good_octets_txd: 46326753008181
>>>> dev.igb.0.mac_stats.total_pkts_txd: 33016014138
>>>> dev.igb.0.mac_stats.good_pkts_txd: 33016006850
>>>> dev.igb.0.mac_stats.bcast_pkts_txd: 834
>>>> dev.igb.0.mac_stats.mcast_pkts_txd: 54331
>>>> dev.igb.0.mac_stats.tx_frames_64: 30741691
>>>> dev.igb.0.mac_stats.tx_frames_65_127: 2174824217
>>>> dev.igb.0.mac_stats.tx_frames_128_255: 139804927
>>>> dev.igb.0.mac_stats.tx_frames_256_511: 59190261
>>>> dev.igb.0.mac_stats.tx_frames_512_1023: 386886648
>>>> dev.igb.0.mac_stats.tx_frames_1024_1522: 30224559106
>>>> dev.igb.0.mac_stats.tso_txd: 2384636909
>>>> dev.igb.0.mac_stats.tso_ctx_fail: 0
>>>> dev.igb.0.interrupts.asserts: 4556119857
>>>> dev.igb.0.interrupts.rx_pkt_timer: 7879778770
>>>> dev.igb.0.interrupts.rx_abs_timer: 0
>>>> dev.igb.0.interrupts.tx_pkt_timer: 0
>>>> dev.igb.0.interrupts.tx_abs_timer: 0
>>>> dev.igb.0.interrupts.tx_queue_empty: 33015268817
>>>> dev.igb.0.interrupts.tx_queue_min_thresh: 7880001470
>>>> dev.igb.0.interrupts.rx_desc_min_thresh: 0
>>>> dev.igb.0.interrupts.rx_overrun: 0
>>>> dev.igb.0.host.breaker_tx_pkt: 0
>>>> dev.igb.0.host.host_tx_pkt_discard: 0
>>>> dev.igb.0.host.rx_pkt: 222702
>>>> dev.igb.0.host.breaker_rx_pkts: 0
>>>> dev.igb.0.host.breaker_rx_pkt_drop: 0
>>>> dev.igb.0.host.tx_good_pkt: 738033
>>>> dev.igb.0.host.breaker_tx_pkt_drop: 0
>>>> dev.igb.0.host.rx_good_bytes: 5990357073320
>>>> dev.igb.0.host.tx_good_bytes: 46326753008181
>>>> dev.igb.0.host.length_errors: 0
>>>> dev.igb.0.host.serdes_violation_pkt: 0
>>>> dev.igb.0.host.header_redir_missed: 0
>>>> dev.igb.0.wake: 0
>>>>
>>>>
>>>> hw.em.eee_setting: 1
>>>> hw.em.rx_process_limit: 100
>>>> hw.em.enable_msix: 1
>>>> hw.em.sbp: 0
>>>> hw.em.smart_pwr_down: 0
>>>> hw.em.txd: 1024
>>>> hw.em.rxd: 1024
>>>> hw.em.rx_abs_int_delay: 66
>>>> hw.em.tx_abs_int_delay: 66
>>>> hw.em.rx_int_delay: 0
>>>> hw.em.tx_int_delay: 66
>>>>
>>>> hw.igb.rx_process_limit: 100
>>>> hw.igb.num_queues: 0
>>>> hw.igb.header_split: 0
>>>> hw.igb.buf_ring_size: 4096
>>>> hw.igb.max_interrupt_rate: 8000
>>>> hw.igb.enable_msix: 1
>>>> hw.igb.enable_aim: 1
>>>> hw.igb.txd: 1024
>>>> hw.igb.rxd: 1024
>>>>
>>>> FreeBSD systemname.com 9.2-RELEASE-p10 FreeBSD 9.2-RELEASE-p10 #0
>>>> r270148M: Mon Aug 18 23:14:36 EDT 2014     root at peta108
>>>> :/usr/obj/usr/src/sys/CUSTOM10
>>>> amd64
>>>>
>>>> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
>>>> 1500
>>>>
>>>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_
>>>> HWCSUM,TSO4,VLAN_HWTSO>
>>>>          ether 00:25:90:f2:2d:24
>>>>          inet6 fe80::225:90ff:fef2:2d24%em0 prefixlen 64 scopeid 0x2
>>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>          media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>          status: active
>>>> igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
>>>> 1500
>>>>
>>>> options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_
>>>> MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
>>>>          ether 00:25:90:f2:2d:24
>>>>          inet6 fe80::225:90ff:fef2:2d25%igb0 prefixlen 64 scopeid 0x4
>>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>          media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>          status: active
>>>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>>>          options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>>>>          inet6 ::1 prefixlen 128
>>>>          inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
>>>>          inet 127.0.0.1 netmask 0xff000000
>>>>          nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
>>>> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu
>>>> 1500
>>>>
>>>> options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_
>>>> HWCSUM,TSO4,VLAN_HWTSO>
>>>>          ether 00:25:90:f2:2d:24
>>>>          inet 192.168.0.108 netmask 0xffffff00 broadcast 192.168.0.255
>>>>          inet6 fe80::225:90ff:fef2:2d24%lagg0 prefixlen 64 scopeid 0x8
>>>>          nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>>>          media: Ethernet autoselect
>>>>          status: active
>>>>          laggproto lacp lagghash l2,l3,l4
>>>>          laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>>>          laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
>>>>
>>>> Thanks in advance!
>>>>
>>>> --
>>>> FF
>>>>
>>>>
>>>
>>>
>>> --
>>> FF
>>> _______________________________________________
>>> freebsd-questions at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>>> To unsubscribe, send any mail to "freebsd-questions-
>>> unsubscribe at freebsd.org"
>>>
>> _______________________________________________
>> freebsd-questions at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>> To unsubscribe, send any mail to "freebsd-questions-
>> unsubscribe at freebsd.org"
>>
>>
>>
>
> --
> -------------------
> Mike Tancsa, tel +1 519 651 3400
> Sentex Communications, mike at sentex.net
> Providing Internet services since 1994 www.sentex.net
> Cambridge, Ontario Canada   http://www.tancsa.com/
>



-- 
FF


More information about the freebsd-questions mailing list