[Bug 200221] em0 watchdog timeout under load

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri May 15 10:46:59 UTC 2015


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200221

            Bug ID: 200221
           Summary: em0 watchdog timeout under load
           Product: Base System
           Version: 10.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: anthony at ury.org.uk

Created attachment 156796
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=156796&action=edit
dmesg.boot and pciconf -lv outputs for machine1 (Core 2 Quad)

I have two machines which are often experiencing watchdog timeouts on their
ethernet interfaces when under load (machine 1 doing a nightly backup over NFS,
machine 2 transferring large files periodically, also over NFS). Both use the
em0 driver, with the machine doing nightly backups having a timeout 4-5 nights
a week. It will typically reset multiple times over a period of 30 minutes
while the backup runs.

Machine 1 is a consumer motherboard with an Intel Core 2 Quad Q6600 CPU.
Machine 2 is a server motherboard with 2 x Intel Xeon E5335 CPU's. It has two
NICs, but only em0 is connected.


Full dmesg and pciconf outputs attached.

===dmesg for machine 1===

grep em0 /var/run/dmesg.boot 
em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0x10c0-0x10df mem
0xe0100000-0xe011ffff,0xe0124000-0xe0124fff irq 20 at device 25.0 on pci0
em0: attempting to allocate 1 MSI vectors (1 supported)
em0: using IRQ 256 for MSI
em0: Using an MSI interrupt
em0: bpf attached
em0: Ethernet address: 00:1c:c0:08:c7:99
em0: Link is up 1000 Mbps Full Duplex

===pciconf for machine 1====

em0 at pci0:0:25:0:        class=0x020000 card=0x00018086 chip=0x104a8086 rev=0x02
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82566DM Gigabit Network Connection'
    class      = network
    subclass   = ethernet

===Snip of messages for machine 1====
<snip> - large number of 'newnfs server X not responding'
May 15 07:51:27 urybsod kernel: em0: Watchdog timeout -- resetting
May 15 07:51:27 urybsod kernel: em0: Queue(0) tdh = 574, hw tdt = 538
May 15 07:51:27 urybsod kernel: em0: TX(0) desc avail = 31,Next TX to Clean =
569
May 15 07:51:27 urybsod kernel: em0: Link is Down
May 15 07:51:27 urybsod kernel: em0: link state changed to DOWN
May 15 07:51:30 urybsod kernel: em0: Link is up 1000 Mbps Full Duplex
May 15 07:51:30 urybsod kernel: em0: link state changed to UP
May 15 07:51:30 urybsod devd: Executing '/etc/rc.d/dhclient quietstart em0'
May 15 07:51:30 urybsod sshd[65879]: fatal: Read from socket failed: Connection
reset by peer [preauth]
May 15 07:51:38 urybsod kernel: newnfs server X is alive again
May 15 07:51:39 urybsod last message repeated 19 times

 uname -a
FreeBSD urybsod 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0: Tue Apr  7 01:09:46
UTC 2015     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 
amd64

===dmesg for machine 2===

em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0x4020-0x403f mem
0xb8820000-0xb883ffff,0xb8400000-0xb87fffff irq 18 at device 0.0 on pci5
em0: Using an MSI interrupt
em0: Ethernet address: 00:04:23:dd:37:cc
em1: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0x4000-0x401f mem
0xb8800000-0xb881ffff,0xb8000000-0xb83fffff irq 19 at device 0.1 on pci5
em1: Using an MSI interrupt
em1: Ethernet address: 00:04:23:dd:37:cd


===pciconf -lv for machine 2===

em0 at pci0:5:0:0: class=0x020000 card=0x346c8086 chip=0x10968086 rev=0x01
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '80003ES2LAN Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet
em1 at pci0:5:0:1: class=0x020000 card=0x346c8086 chip=0x10968086 rev=0x01
hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '80003ES2LAN Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet


===messages for machine 2===

<snip> - large number of 'newnfs server Y not responding'
May  4 02:45:40 urystv kernel: newnfs server Y: not responding
May  4 02:48:43 urystv kernel: em0: Watchdog timeout -- resetting
May  4 02:48:43 urystv kernel: em0: Queue(0) tdh = 294, hw tdt = 898
May  4 02:48:43 urystv kernel: em0: TX(0) desc avail = 411,Next TX to Clean =
285
May  4 02:48:43 urystv kernel: em0: link state changed to DOWN
May  4 02:48:46 urystv kernel: em0: link state changed to UP
May  4 02:48:46 urystv devd: Executing '/etc/rc.d/dhclient quietstart em0'
May  4 02:49:08 urystv kernel: newnfs server Y: is alive again
May  4 02:49:09 urystv last message repeated 19 times

uname -a
FreeBSD urystv 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0: Tue Apr  7 01:09:46
UTC 2015     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 
amd64

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list