6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial
jfvogel at gmail.com
Tue Jan 16 18:53:05 UTC 2007
On 1/16/07, Mike Andrews <mandrews at bit0.com> wrote:
> I have a strange issue with em0 watchdog timeouts that I think is not the
> same as the ones everyone was having during the 6.2 beta cycle...
> I have six systems, each with two Intel GigE ports onboard:
> Systems A and B: Supermicro PDSMi+
> Systems C and D: Supermicro PDSMi (without the plus)
> System E: Tyan S2730U3GN
> System F: Supermicro X5DPA-GG
> On each system:
> em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch.
> em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch.
> All six run FreeBSD 6.2-RELEASE i386, even though the first four are
> capable of running amd64. They all have 2 GB of memory, except E which
> has 4 GB. The kernel configs are all identical, and are not that far from
> GENERIC + SMP.
> Several times a day, em0 will go down, give a watchdog timeout error on
> the console, then come right back up on its own a few seconds later. But
> here's the weird twist: it ONLY happens on systems A and B, and ONLY when
> running at gigabit speed. If I knock the two switch ports down to 100
> meg, the problem goes away.
> The other four systems C thru F never have watchdog timeout issues; they
> always work perfectly even at gigabit speed.
> So I'm trying to figure out if there are any other obvious hardware
> differences between the plus and non-plus version of the PDSMi that would
> be causing issues on the plus version. Fortunately, at the moment we are
> not (yet) pushing anywhere near even 100 meg worth of traffic through
> these ports, so it's a tolerable workaround... just kinda annoying. :)
> The chipset is a bit different: the PDSMi is the Intel E7230 chipset for
> Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo
> support. But apparently the NIC chips are identical: 82573V for em0 and
> 82573L for em1. The BIOS is identical too, so the chipsets must be pretty
> similar. Nothing shares an IRQ with the NICs. (USB is disabled in the
> BIOS.) They do have different disk systems; A and B are SATA gmirror
> setups, while C and D use LSI Megaraid SCSI cards for their mirrors.
> I have tried the obvious switching the cables out. No difference at all.
> I have NOT yet tried a different gigabit switch.
> Hopefully that's enough detail to start; I can get into more specifics as
> needed. (Kernel configs, dmesg output, IRQ details, disk details, IPMI,
> running apps, serial console access if needed...)
There are some management related issues with this NIC, first if you
have not done so make a DOS bootable device, and run this app I
am enclosing, it fixes the prom setting that is wrong on some devices.
It will do no harm, and it may solve things.
Let me know if it does fix it please.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 158727 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070116/50a088dc/dcgdis-0001.obj
More information about the freebsd-stable