Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled

Hooman Fazaeli hoomanfazaeli at gmail.com
Tue Nov 8 17:21:31 UTC 2011


On 11/7/2011 9:24 PM, Jason Wolfe wrote:
> On Mon, Oct 31, 2011 at 1:13 AM, Hooman Fazaeli <hoomanfazaeli at gmail.com <mailto:hoomanfazaeli at gmail.com>> wrote:
>
>
>     Attached is a patch for if_em.c. It flushes interface queue when it is full
>     and link is not active. Please note that when this happens, drops are increasing
>     on interface and this will trigger your scripts as before. You need to change
>     a little the scripts as follows:
>
>                   check interface TX status
>                   if (interface TX seems hung) {
>                             sleep 5
>                             check interface TX status
>                             if (interface TX seems hung) {
>                                  reset the interface.
>                             }
>                     }
>
>     For MULTIQUEUE, it just disables the check for link status (which is not good).
>     so pls. test in non-MULTIQUEUE mode.
>
>     The patch also contains some minor fixups to compile on 7 plus
>     a fix from r1.69 which addressed RX hang problem (the fix was
>     later removed in r1.70). I included it for Emil to give it a
>     try.
>
>     Pls. let me know if you have any problems with patch.
>
>
> Hooman,
>
> Unfortunately  one of the server just had a wedge event a couple hours ago with this patch.  To confirm your changes should cause a recovery within the time I'm allowing, here is the current format:
>
> check interface TX status
> if (interface TX seems hung) {
>     sleep 3
>     check packets out
>     sleep 2
>     check packets out
>     if (packets not incrementing) {
>         reset the interface
>     }
> }
>
> I bounced em0 because dropped packets incremented 1749543 to 1749708 and the interface is not incrementing packets out.
>
> 4:10AM up 6 days, 15:23, 0 users, load averages: 0.02, 0.12, 0.14
>
> em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
> ether X
> inet6 X%em0 prefixlen 64 scopeid 0x1
> nd6 options=1<PERFORMNUD>
> media: Ethernet autoselect (1000baseT <full-duplex>)
> status: active
>
> em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
> ether X
> inet6 X%em1 prefixlen 64 scopeid 0x2
> nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
> media: Ethernet autoselect (1000baseT <full-duplex>)
> status: active
>
> ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
>
> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
> options=3<RXCSUM,TXCSUM>
> inet 127.0.0.1 netmask 0xff000000
> inet6 ::1 prefixlen 128
> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4
> inet X.X.X.X netmask 0xffffffff
> inet X.X.X.X netmask 0xffffffff
> inet X.X.X.X netmask 0xffffffff
> nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
>
> lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
> ether X
> inet X.X.X.X netmask 0xffffff00 broadcast X.X.X.X
> inet6 X%lagg0 prefixlen 64 scopeid 0x5
> inet6 X prefixlen 64 autoconf
> nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
> media: Ethernet autoselect
> status: active
> laggproto loadbalance
> laggport: em0 flags=4<ACTIVE>
> laggport: em1 flags=4<ACTIVE>
>
> interrupt total rate
> irq3: uart1 3810 0
> cpu0: timer 1147568087 2000
> irq256: em0:rx 0 59779710 104
> irq257: em0:tx 0 2771888652 4831
> irq258: em0:link 1 0
> irq259: em1:rx 0 3736828886 6512
> irq260: em1:tx 0 2790566376 4863
> irq261: em1:link 27286 0
> irq262: mps0 395687386 689
> cpu1: timer 1147559894 2000
> cpu2: timer 1147559901 2000
> cpu3: timer 1147559902 2000
> Total 14345029891 25001
>
> 13466/4144/17610 mbufs in use (current/cache/total)
> 2567/2635/5202/5853720 mbuf clusters in use (current/cache/total/max)
> 2567/633 mbuf+clusters out of packet secondary zone in use (current/cache)
> 6798/554/7352/2926859 4k (page size) jumbo clusters in use (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 35692K/8522K/44214K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/0/0 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
>
> Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
> em0 1500 <Link#1> 00:25:90:2b:e5:75 60747643 0 0 11246408092 0 0 1750763
> em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 4 - - -
> em1 1500 <Link#2> 00:25:90:2b:e5:75 11237195776 123950 0 11344722383 0 0 545682
> em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - -
> lagg0 1500 <Link#5> 00:25:90:2b:e5:75 11297850142 0 0 22588666102 2296445 0 0
> lagg0 1500 69.164.38.0/2 <http://69.164.38.0/2> 69.164.38.83 10189108030 - - 22592881776 - - -
> lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 24 - - 28 - - -
> lagg0 1500 2607:f4e8:310 2607:f4e8:310:12: 19578 - - 19591 - - -
>
> kern.msgbuf:
>
> Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE
> Nov 7 04:10:06 cds1033 kernel: em0: hw tdh = 558, hw tdt = 558
> Nov 7 04:10:06 cds1033 kernel: em0: hw rdh = 889, hw rdt = 888
> Nov 7 04:10:06 cds1033 kernel: em0: Tx Queue Status = 0
> Nov 7 04:10:06 cds1033 kernel: em0: TX descriptors avail = 1024
> Nov 7 04:10:06 cds1033 kernel: em0: Tx Descriptors avail failure = 0
> Nov 7 04:10:06 cds1033 kernel: em0: RX discarded packets = 0
> Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Check = 889
> Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Refresh = 888
> Nov 7 04:10:06 cds1033 kernel: em0: Link state: active
> Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE
> Nov 7 04:10:06 cds1033 kernel: em1: hw tdh = 837, hw tdt = 837
> Nov 7 04:10:06 cds1033 kernel: em1: hw rdh = 198, hw rdt = 189
> Nov 7 04:10:06 cds1033 kernel: em1: Tx Queue Status = 0
> Nov 7 04:10:06 cds1033 kernel: em1: TX descriptors avail = 1021
> Nov 7 04:10:06 cds1033 kernel: em1: Tx Descriptors avail failure = 0
> Nov 7 04:10:06 cds1033 kernel: em1: RX discarded packets = 0
> Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Check = 303
> Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Refresh = 302
> Nov 7 04:10:06 cds1033 kernel: em1: Link state: active
>
> net.inet.ip.intr_queue_maxlen: 512
> net.inet.ip.intr_queue_drops: 0
> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
> dev.em.0.%driver: em
> dev.em.0.%location: slot=0 function=0
> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000
> dev.em.0.%parent: pci1
> dev.em.0.nvm: -1
> dev.em.0.debug: -1
> dev.em.0.rx_int_delay: 0
> dev.em.0.tx_int_delay: 66
> dev.em.0.rx_abs_int_delay: 66
> dev.em.0.tx_abs_int_delay: 66
> dev.em.0.rx_processing_limit: 100
> dev.em.0.flow_control: 0
> dev.em.0.eee_control: 0
> dev.em.0.link_irq: 1
> dev.em.0.mbuf_alloc_fail: 0
> dev.em.0.cluster_alloc_fail: 0
> dev.em.0.dropped: 0
> dev.em.0.tx_dma_fail: 0
> dev.em.0.rx_overruns: 0
> dev.em.0.watchdog_timeouts: 0
> dev.em.0.link_toggles: 3
> dev.em.0.device_control: 1074790984
> dev.em.0.rx_control: 67141634
> dev.em.0.fc_high_water: 18432
> dev.em.0.fc_low_water: 16932
> dev.em.0.queue0.txd_head: 558
> dev.em.0.queue0.txd_tail: 558
> dev.em.0.queue0.tx_irq: 2771888613
> dev.em.0.queue0.no_desc_avail: 0
> dev.em.0.queue0.rxd_head: 102
> dev.em.0.queue0.rxd_tail: 101
> dev.em.0.queue0.rx_irq: 59779941
> dev.em.0.mac_stats.excess_coll: 0
> dev.em.0.mac_stats.single_coll: 0
> dev.em.0.mac_stats.multiple_coll: 0
> dev.em.0.mac_stats.late_coll: 0
> dev.em.0.mac_stats.collision_count: 0
> dev.em.0.mac_stats.symbol_errors: 0
> dev.em.0.mac_stats.sequence_errors: 0
> dev.em.0.mac_stats.defer_count: 0
> dev.em.0.mac_stats.missed_packets: 0
> dev.em.0.mac_stats.recv_no_buff: 0
> dev.em.0.mac_stats.recv_undersize: 0
> dev.em.0.mac_stats.recv_fragmented: 0
> dev.em.0.mac_stats.recv_oversize: 0
> dev.em.0.mac_stats.recv_jabber: 0
> dev.em.0.mac_stats.recv_errs: 0
> dev.em.0.mac_stats.crc_errs: 0
> dev.em.0.mac_stats.alignment_errs: 0
> dev.em.0.mac_stats.coll_ext_errs: 0
> dev.em.0.mac_stats.xon_recvd: 0
> dev.em.0.mac_stats.xon_txd: 0
> dev.em.0.mac_stats.xoff_recvd: 0
> dev.em.0.mac_stats.xoff_txd: 0
> dev.em.0.mac_stats.total_pkts_recvd: 60768005
> dev.em.0.mac_stats.good_pkts_recvd: 60768005
> dev.em.0.mac_stats.bcast_pkts_recvd: 60744833
> dev.em.0.mac_stats.mcast_pkts_recvd: 2948
> dev.em.0.mac_stats.rx_frames_64: 60744822
> dev.em.0.mac_stats.rx_frames_65_127: 3491
> dev.em.0.mac_stats.rx_frames_128_255: 19293
> dev.em.0.mac_stats.rx_frames_256_511: 364
> dev.em.0.mac_stats.rx_frames_512_1023: 35
> dev.em.0.mac_stats.rx_frames_1024_1522: 0
> dev.em.0.mac_stats.good_octets_recvd: 3890774076
> dev.em.0.mac_stats.good_octets_txd: 15295538422911
> dev.em.0.mac_stats.total_pkts_txd: 11246435226
> dev.em.0.mac_stats.good_pkts_txd: 11246435226
> dev.em.0.mac_stats.bcast_pkts_txd: 4
> dev.em.0.mac_stats.mcast_pkts_txd: 3830
> dev.em.0.mac_stats.tx_frames_64: 60338532
> dev.em.0.mac_stats.tx_frames_65_127: 897135811
> dev.em.0.mac_stats.tx_frames_128_255: 13933787
> dev.em.0.mac_stats.tx_frames_256_511: 22107540
> dev.em.0.mac_stats.tx_frames_512_1023: 116374045
> dev.em.0.mac_stats.tx_frames_1024_1522: 10136545511
> dev.em.0.mac_stats.tso_txd: 0
> dev.em.0.mac_stats.tso_ctx_fail: 0
> dev.em.0.interrupts.asserts: 3
> dev.em.0.interrupts.rx_pkt_timer: 0
> dev.em.0.interrupts.rx_abs_timer: 0
> dev.em.0.interrupts.tx_pkt_timer: 0
> dev.em.0.interrupts.tx_abs_timer: 0
> dev.em.0.interrupts.tx_queue_empty: 0
> dev.em.0.interrupts.tx_queue_min_thresh: 0
> dev.em.0.interrupts.rx_desc_min_thresh: 0
> dev.em.0.interrupts.rx_overrun: 0
> dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
> dev.em.1.%driver: em
> dev.em.1.%location: slot=0 function=0
> dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000
> dev.em.1.%parent: pci2
> dev.em.1.nvm: -1
> dev.em.1.debug: -1
> dev.em.1.rx_int_delay: 0
> dev.em.1.tx_int_delay: 66
> dev.em.1.rx_abs_int_delay: 66
> dev.em.1.tx_abs_int_delay: 66
> dev.em.1.rx_processing_limit: 100
> dev.em.1.flow_control: 0
> dev.em.1.eee_control: 0
> dev.em.1.link_irq: 27207
> dev.em.1.mbuf_alloc_fail: 0
> dev.em.1.cluster_alloc_fail: 0
> dev.em.1.dropped: 0
> dev.em.1.tx_dma_fail: 0
> dev.em.1.rx_overruns: 0
> dev.em.1.watchdog_timeouts: 0
> dev.em.1.link_toggles: 3
> dev.em.1.device_control: 1074790984
> dev.em.1.rx_control: 67141634
> dev.em.1.fc_high_water: 18432
> dev.em.1.fc_low_water: 16932
> dev.em.1.queue0.txd_head: 804
> dev.em.1.queue0.txd_tail: 804
> dev.em.1.queue0.tx_irq: 2790571888
> dev.em.1.queue0.no_desc_avail: 0
> dev.em.1.queue0.rxd_head: 686
> dev.em.1.queue0.rxd_tail: 686
> dev.em.1.queue0.rx_irq: 3729597000
> dev.em.1.mac_stats.excess_coll: 0
> dev.em.1.mac_stats.single_coll: 0
> dev.em.1.mac_stats.multiple_coll: 0
> dev.em.1.mac_stats.late_coll: 0
> dev.em.1.mac_stats.collision_count: 0
> dev.em.1.mac_stats.symbol_errors: 0
> dev.em.1.mac_stats.sequence_errors: 0
> dev.em.1.mac_stats.defer_count: 0
> dev.em.1.mac_stats.missed_packets: 123950
> dev.em.1.mac_stats.recv_no_buff: 7063
> dev.em.1.mac_stats.recv_undersize: 0
> dev.em.1.mac_stats.recv_fragmented: 0
> dev.em.1.mac_stats.recv_oversize: 0
> dev.em.1.mac_stats.recv_jabber: 0
> dev.em.1.mac_stats.recv_errs: 0
> dev.em.1.mac_stats.crc_errs: 0
> dev.em.1.mac_stats.alignment_errs: 0
> dev.em.1.mac_stats.coll_ext_errs: 0
> dev.em.1.mac_stats.xon_recvd: 0
> dev.em.1.mac_stats.xon_txd: 0
> dev.em.1.mac_stats.xoff_recvd: 0
> dev.em.1.mac_stats.xoff_txd: 0
> dev.em.1.mac_stats.total_pkts_recvd: 11237327404
> dev.em.1.mac_stats.good_pkts_recvd: 11237203454
> dev.em.1.mac_stats.bcast_pkts_recvd: 60741559
> dev.em.1.mac_stats.mcast_pkts_recvd: 2935
> dev.em.1.mac_stats.rx_frames_64: 4479481614
> dev.em.1.mac_stats.rx_frames_65_127: 5124633522
> dev.em.1.mac_stats.rx_frames_128_255: 13228214
> dev.em.1.mac_stats.rx_frames_256_511: 23321029
> dev.em.1.mac_stats.rx_frames_512_1023: 74960116
> dev.em.1.mac_stats.rx_frames_1024_1522: 1521578959
> dev.em.1.mac_stats.good_octets_recvd: 3024997755633
> dev.em.1.mac_stats.good_octets_txd: 15349574964083
> dev.em.1.mac_stats.total_pkts_txd: 11344737635
> dev.em.1.mac_stats.good_pkts_txd: 11344737635
> dev.em.1.mac_stats.bcast_pkts_txd: 2501
> dev.em.1.mac_stats.mcast_pkts_txd: 8
> dev.em.1.mac_stats.tx_frames_64: 62839963
> dev.em.1.mac_stats.tx_frames_65_127: 962219224
> dev.em.1.mac_stats.tx_frames_128_255: 14620849
> dev.em.1.mac_stats.tx_frames_256_511: 20410906
> dev.em.1.mac_stats.tx_frames_512_1023: 117272600
> dev.em.1.mac_stats.tx_frames_1024_1522: 10167374093
> dev.em.1.mac_stats.tso_txd: 0
> dev.em.1.mac_stats.tso_ctx_fail: 0
> dev.em.1.interrupts.asserts: 19749
> dev.em.1.interrupts.rx_pkt_timer: 1
> dev.em.1.interrupts.rx_abs_timer: 0
> dev.em.1.interrupts.tx_pkt_timer: 0
> dev.em.1.interrupts.tx_abs_timer: 0
> dev.em.1.interrupts.tx_queue_empty: 0
> dev.em.1.interrupts.tx_queue_min_thresh: 0
> dev.em.1.interrupts.rx_desc_min_thresh: 0
> dev.em.1.interrupts.rx_overrun: 18
> hw.em.rx_hang_fixup: 0
> hw.em.eee_setting: 0
> hw.em.fc_setting: 0
> hw.em.rx_process_limit: 100
> hw.em.enable_msix: 1
> hw.em.sbp: 0
> hw.em.smart_pwr_down: 0
> hw.em.txd: 1024
> hw.em.rxd: 1024
> hw.em.rx_abs_int_delay: 66
> hw.em.tx_abs_int_delay: 66
> hw.em.rx_int_delay: 0
> hw.em.tx_int_delay: 66
>
> Jason

I have allocated more time to the problem and guess I can explain what
your problem is.

With MSIX disabled, the driver uses fast interrupt handler (em_irq_fast)
which calls rx/tx task and then checks for link status change. This
implies that rx/tx task is executed with every link state change. This is
not efficient, as it is a waste of time to start transmission when link is down.
However, it has the effect that after a temporary link loss (active->inactive->active),
_start is executed and transmission continues normally. The value of link_toggles (3)
clearly indicates that you had such a transition when the problem occured.

With MSIX enabled, the link task (em_handle_link) does _not_ triggers
_start when the link changes state from inactive to active (which it should).
If if_snd quickly fills up during a temporary link loss, transmission is
stopped forever and the driver never recovers from that state.

The last patch should have reduced the frequency of the problem
but it assumes every IFQ_ENQUEUE is followed by a if_start which
is not a true assumption.

If you are willing to test, I can prepare another patch for you to fix
the issue in a different and more reliable way.












More information about the freebsd-net mailing list