em0 watchdog timeout
Willem Jan Withagen
wjw at digiware.nl
Thu Nov 10 11:52:23 UTC 2011
On 10-11-2011 10:50, Jeremy Chadwick wrote:
> On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote:
>> Still running this file server on ZFS, and every now and then em0
>> goes down, and is not revivable.... Nothing goes in or out the
>> box...
>>
>> Any suggestions as how to (help) fix this?
>
> CC'ing Jack Vogel of Intel.
>
> We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this
> regard).
em0 at pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086
rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xdf900000, size 131072,
enabled
bar [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled
bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 13[e0] = PCI Advanced Features: FLR TP
dmidecode gives:
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Supermicro
Product Name: C2SBX
Version: 0123456789
Serial Number: 0123456789
UUID: 53D1A494-D663-A0E7-890B-003048DE97CD
Wake-up Type: Power Switch
SKU Number: Not Specified
Family: Not Specified
> Also, please do "sysctl dev.em.0.debug=1", which will show nothing
> useful in the output, however "dmesg" shortly after should have a bunch
> of driver-level debugging information that should help (output starts
> with "Interface is ...". Please provide that too.
System is rebooted. So currrently there is nothing serious in trouble.
But trying to switch is on does not seem to work?
# sysctl dev.em.0.debug=1
dev.em.0.debug: -1 -> -1
# sysctl -a | grep debug | grep em
dev.em.0.debug: -1
Or is it just to dump this:
Nov 10 12:44:27 zfs kernel: Interface is RUNNING and INACTIVE
Nov 10 12:44:27 zfs kernel: em0: hw tdh = 965, hw tdt = 965
Nov 10 12:44:27 zfs kernel: em0: hw rdh = 586, hw rdt = 585
Nov 10 12:44:27 zfs kernel: em0: Tx Queue Status = 0
Nov 10 12:44:27 zfs kernel: em0: TX descriptors avail = 1024
Nov 10 12:44:27 zfs kernel: em0: Tx Descriptors avail failure = 0
Nov 10 12:44:27 zfs kernel: em0: RX discarded packets = 0
Nov 10 12:44:27 zfs kernel: em0: RX Next to Check = 586
Nov 10 12:44:27 zfs kernel: em0: RX Next to Refresh = 585
I'm telling everybody always that they should go for intel ethernet
devices, because "they just work". And I'm still very much convinced of
this. So I'll be more than happy to do any debugging and/or testing
required. The only thing I can not afford at the moment is leave this
box in disconnected state.
And note that this problem only raises it nasty head very few weeks...
--WjW
>
>> Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189
>> Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to Clean = 187
>> Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151
>> Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to Clean = 139
>> Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163
>> Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to Clean = 152
>> Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176
>> Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to Clean = 160
>> Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172
>> Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to Clean = 157
>>
>> Device is:
>> Nov 10 10:07:27 zfs kernel: em0:<Intel(R) PRO/1000 Network Connection 7.2.3> port 0x1820-0x183f mem 0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0
>> Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt
>> Nov 10 10:07:27 zfs kernel: em0: [FILTER]
>>
>> pciconf -lv:
>> em0 at pci0:0:25:0: class=0x020000 card=0x10bd15d9
>> chip=0x10bd8086 rev=0x02 hdr=0x00
>> vendor = 'Intel Corporation'
>> device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
>> class = network
>> subclass = ethernet
>>
>> uname:
>> 8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct 2 13:36:55 CEST 2011
>> amd64
>>
>> sysctl -a | grep em.0:
>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
>> dev.em.0.%driver: em
>> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_
>> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9
>> subdevice=0x10bd class=0x020000
>> dev.em.0.%parent: pci0
>> dev.em.0.nvm: -1
>> dev.em.0.debug: -1
>> dev.em.0.rx_int_delay: 0
>> dev.em.0.tx_int_delay: 66
>> dev.em.0.rx_abs_int_delay: 66
>> dev.em.0.tx_abs_int_delay: 66
>> dev.em.0.rx_processing_limit: 100
>> dev.em.0.flow_control: 3
>> dev.em.0.eee_control: 0
>> dev.em.0.link_irq: 0
>> dev.em.0.mbuf_alloc_fail: 0
>> dev.em.0.cluster_alloc_fail: 0
>> dev.em.0.dropped: 0
>> dev.em.0.tx_dma_fail: 0
>> dev.em.0.rx_overruns: 6
>> dev.em.0.watchdog_timeouts: 5
>> dev.em.0.device_control: 1074790976
>> dev.em.0.rx_control: 67141634
>> dev.em.0.fc_high_water: 8192
>> dev.em.0.fc_low_water: 6692
>> dev.em.0.queue0.txd_head: 78
>> dev.em.0.queue0.txd_tail: 78
>> dev.em.0.queue0.tx_irq: 0
>> dev.em.0.queue0.no_desc_avail: 0
>> dev.em.0.queue0.rxd_head: 376
>> dev.em.0.queue0.rxd_tail: 375
>> dev.em.0.queue0.rx_irq: 0
>> dev.em.0.mac_stats.excess_coll: 0
>> dev.em.0.mac_stats.single_coll: 0
>> dev.em.0.mac_stats.multiple_coll: 0
>> dev.em.0.mac_stats.late_coll: 0
>> dev.em.0.mac_stats.collision_count: 0
>> dev.em.0.mac_stats.symbol_errors: 0
>> dev.em.0.mac_stats.sequence_errors: 0
>> dev.em.0.mac_stats.defer_count: 0
>> dev.em.0.mac_stats.missed_packets: 9
>> dev.em.0.mac_stats.recv_no_buff: 0
>> dev.em.0.mac_stats.recv_undersize: 0
>> dev.em.0.mac_stats.recv_fragmented: 0
>> dev.em.0.mac_stats.recv_oversize: 0
>> dev.em.0.mac_stats.recv_jabber: 0
>> dev.em.0.mac_stats.recv_errs: 1
>> dev.em.0.mac_stats.crc_errs: 1
>> dev.em.0.mac_stats.alignment_errs: 0
>> dev.em.0.mac_stats.coll_ext_errs: 0
>> dev.em.0.mac_stats.xon_recvd: 0
>> dev.em.0.mac_stats.xon_txd: 0
>> dev.em.0.mac_stats.xoff_recvd: 0
>> dev.em.0.mac_stats.xoff_txd: 0
>> dev.em.0.mac_stats.total_pkts_recvd: 160062850
>> dev.em.0.mac_stats.good_pkts_recvd: 160062840
>> dev.em.0.mac_stats.bcast_pkts_recvd: 79648
>> dev.em.0.mac_stats.mcast_pkts_recvd: 10220
>> dev.em.0.mac_stats.rx_frames_64: 0
>> dev.em.0.mac_stats.rx_frames_65_127: 0
>> dev.em.0.mac_stats.rx_frames_128_255: 0
>> dev.em.0.mac_stats.rx_frames_256_511: 0
>> dev.em.0.mac_stats.rx_frames_512_1023: 0
>> dev.em.0.mac_stats.rx_frames_1024_1522: 0
>> dev.em.0.mac_stats.good_octets_recvd: 107143604749
>> dev.em.0.mac_stats.good_octets_txd: 129876768158
>> dev.em.0.mac_stats.total_pkts_txd: 179010567
>> dev.em.0.mac_stats.good_pkts_txd: 179010567
>> dev.em.0.mac_stats.bcast_pkts_txd: 14608
>> dev.em.0.mac_stats.mcast_pkts_txd: 206
>> dev.em.0.mac_stats.tx_frames_64: 0
>> dev.em.0.mac_stats.tx_frames_65_127: 0
>> dev.em.0.mac_stats.tx_frames_128_255: 0
>> dev.em.0.mac_stats.tx_frames_256_511: 0
>> dev.em.0.mac_stats.tx_frames_512_1023: 0
>> dev.em.0.mac_stats.tx_frames_1024_1522: 0
>> dev.em.0.mac_stats.tso_txd: 3691806
>> dev.em.0.mac_stats.tso_ctx_fail: 0
>> dev.em.0.interrupts.asserts: 130023913
>> dev.em.0.interrupts.rx_pkt_timer: 0
>> dev.em.0.interrupts.rx_abs_timer: 0
>> dev.em.0.interrupts.tx_pkt_timer: 0
>> dev.em.0.interrupts.tx_abs_timer: 0
>> dev.em.0.interrupts.tx_queue_empty: 0
>> dev.em.0.interrupts.tx_queue_min_thresh: 0
>> dev.em.0.interrupts.rx_desc_min_thresh: 0
>> dev.em.0.interrupts.rx_overrun: 0
>> dev.em.0.wake: 0
>
More information about the freebsd-stable
mailing list