6.2-RELEASE em0 watchdog timeouts -- sometimes (w/ partial workaround)

Jack Vogel jfvogel at gmail.com
Tue Jan 16 18:53:05 UTC 2007


On 1/16/07, Mike Andrews <mandrews at bit0.com> wrote:
> I have a strange issue with em0 watchdog timeouts that I think is not the
> same as the ones everyone was having during the 6.2 beta cycle...
>
> I have six systems, each with two Intel GigE ports onboard:
>
> Systems A and B: Supermicro PDSMi+
> Systems C and D: Supermicro PDSMi (without the plus)
> System E: Tyan S2730U3GN
> System F: Supermicro X5DPA-GG
>
> On each system:
> em0 is connected to a Cisco Catalyst 2960G layer 2 gigabit ethernet switch.
> em1 is connected to a Foundry Serveriron XL layer 4-7 fast ethernet switch.
>
> All six run FreeBSD 6.2-RELEASE i386, even though the first four are
> capable of running amd64.  They all have 2 GB of memory, except E which
> has 4 GB.  The kernel configs are all identical, and are not that far from
> GENERIC + SMP.
>
> Several times a day, em0 will go down, give a watchdog timeout error on
> the console, then come right back up on its own a few seconds later.  But
> here's the weird twist: it ONLY happens on systems A and B, and ONLY when
> running at gigabit speed.  If I knock the two switch ports down to 100
> meg, the problem goes away.
>
> The other four systems C thru F never have watchdog timeout issues; they
> always work perfectly even at gigabit speed.
>
> So I'm trying to figure out if there are any other obvious hardware
> differences between the plus and non-plus version of the PDSMi that would
> be causing issues on the plus version.  Fortunately, at the moment we are
> not (yet) pushing anywhere near even 100 meg worth of traffic through
> these ports, so it's a tolerable workaround...  just kinda annoying. :)
>
> The chipset is a bit different: the PDSMi is the Intel E7230 chipset for
> Pentium D servers, where the PDSMi+ is the E3000 that adds Core 2 Duo
> support.  But apparently the NIC chips are identical: 82573V for em0 and
> 82573L for em1.  The BIOS is identical too, so the chipsets must be pretty
> similar.  Nothing shares an IRQ with the NICs.  (USB is disabled in the
> BIOS.)  They do have different disk systems; A and B are SATA gmirror
> setups, while C and D use LSI Megaraid SCSI cards for their mirrors.
>
> I have tried the obvious switching the cables out.  No difference at all.
>
> I have NOT yet tried a different gigabit switch.
>
> Hopefully that's enough detail to start; I can get into more specifics as
> needed.  (Kernel configs, dmesg output, IRQ details, disk details, IPMI,
> running apps, serial console access if needed...)

There are some management related issues with this NIC, first if you
have not done so make a DOS bootable device, and run this app I
am enclosing, it fixes the prom setting that is wrong on some devices.
It will do no harm, and it may solve things.

Let me know if it does fix it please.

Jack
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dcgdis.ThisIsZip
Type: application/octet-stream
Size: 158727 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070116/50a088dc/dcgdis-0001.obj


More information about the freebsd-stable mailing list