[Bug 199174] em tx and rx hang
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Sun Apr 5 13:28:24 UTC 2015
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174
Bug ID: 199174
Summary: em tx and rx hang
Product: Base System
Version: 10.1-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: david.keller at litchis.fr
Hi,
While sending moderated nfs traffic < 2Mo/s, the interface suddenly stopped
transmitting/receiving.
However the interface seemed fine:
$ ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
ether 00:25:90:34:5d:44
inet YYYY netmask 0xffffff00 broadcast YYY.255
inet6 fe80::225:90ff:fe34:5d44%em0 prefixlen 64 scopeid 0x1
inet6 XXXX prefixlen 64 autoconf
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
Pinging gateway didn't work:
$ ping ZZZZ
PING ZZZZ (ZZZZ): 56 data bytes
ping: sendto: Host is down
ping: sendto: Host is down
But driver seemed happy with the card as no particular message was printed.
# tcpdump -ni em0
-> No rx traffic, only tx.
Printing em driver internal variables was more interesting:
$ sysctl dev.em.0.debug=1
Interface is RUNNING and ACTIVE
em0: hw tdh = 325, hw tdt = 166
em0: hw rdh = 688, hw rdt = 687
em0: Tx Queue Status = 1
em0: TX descriptors avail = 150
em0: Tx Descriptors avail failure = 0
em0: RX discarded packets = 0
em0: RX Next to Check = 688
em0: RX Next to Refresh = 687
Sending PING request incremented hw tdt as expected. Wondering what would
happen when it would reach tdh value, I ping-flooded the gateway.
Driver figured out something was going bad and reset the card:
#ping -f ZZZZ
em0: Watchdog timeout -- resetting
em0: Queue(0) tdh = 325, hw tdt = 285
em0: TX(0) desc avail = 31,Next TX to Clean = 316
em0: link state changed to DOWN
em0: link state changed to UP
Interface is RUNNING and ACTIVE
em0: hw tdh = 113, hw tdt = 113
em0: hw rdh = 36, hw rdt = 35
em0: Tx Queue Status = 0
em0: TX descriptors avail = 1024
em0: Tx Descriptors avail failure = 0
em0: RX discarded packets = 0
em0: RX Next to Check = 36
em0: RX Next to Refresh = 35
>From here, the interface was working as usual.
$ ping ZZZZ
PING ZZZZ (ZZZZ): 56 data bytes
64 bytes from ZZZZ: icmp_seq=0 ttl=255 time=0.241 ms
$dmesg
FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015
[...]
em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0xdc00-0xdc1f mem
0xfe9e0000-0xfe9fffff,0xfe9dc000-0xfe9dffff irq 16 at device 0.0 on pci2
em0: Using MSIX interrupts with 3 vectors
em0: Ethernet address: 00:25:90:34:5d:44
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0
pci3: <ACPI PCI bus> on pcib3
em1: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0xec00-0xec1f mem
0xfeae0000-0xfeafffff,0xfeadc000-0xfeadffff irq 17 at device 0.0 on pci3
em1: Using MSIX interrupts with 3 vectors
em1: Ethernet address: 00:25:90:34:5d:45
$pciconf -elv
[...]
em0 at pci0:2:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00
hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class = network
subclass = ethernet
PCI-e errors = Correctable Error Detected
Unsupported Request Detected
Corrected = Receiver Error
Bad TLP
Bad DLLP
REPLAY_NUM Rollover
Replay Timer Timeout
Advisory Non-Fatal Error
em1 at pci0:3:0:0: class=0x020000 card=0x060a15d9 chip=0x10d38086 rev=0x00
hdr=0x00
vendor = 'Intel Corporation'
device = '82574L Gigabit Network Connection'
class = network
subclass = ethernet
PCI-e errors = Correctable Error Detected
Unsupported Request Detected
Corrected = Receiver Error
Bad TLP
Bad DLLP
Replay Timer Timeout
Advisory Non-Fatal Error
The port is connected to a GS108 switch. Link was up the whole time and no
transmit error has been detected.
Motherboard is a Supermicro X7SPA-HF with latest bios.
On this board, there is a BMC sharing the em0 port. The BMC was not responding
either.
Hence my lucky guess would be that it may not be the driver fault as the BMC
has suffered too, but the card fault.
This is also happening on an OpenBSD em0 with the same motherboard (but not
connected to the same switch).
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list