fxp0: device timed out problem
Robert Watson
rwatson at freebsd.org
Mon Jan 24 01:54:11 PST 2005
On Mon, 24 Jan 2005, Ganbold wrote:
> > > I turned off debug.mpsafenet to 0 and it seems like problem goes away.
> ><snip>
> > > Is this problem related to network stack? Or is it related to fxp driver?
> >
> >It's most likely a problem with the device driver or interrupt
> >configuration on your system. There are a couple of other variables you
> >might try frobbing:
> >
> >- Use of ACPI to configure the hardware
> >- Use of "device apic" if the system is non-SMP
>
> I see. One of the server is SMP system and device apic option is used in
> kernel config file. I didn't try device apic on non-SMP machine.
Any luck with disabling ACPI? In particular, are the interrupt
assignments substantially different between booting with ACPI and without?
You can probably just diff -u the old dmesg.boot and the new one...
> >Usually a device timed out error is related to interrupts from the device
> >not being delivered, being delivered improperly, etc. Does your dmesg
> >contain any references to interrupt storms? Once the above message has
> >printed, do you see any further interrupts on the fxp interrupt source
> >when checking intermittently with "systat -vmstat 1" or "vmstat -i"?
>
> I couldn't check the system by issuing those commands. Following is the
> dmesg output with debug.mpsafenet disabled:
Couldn't as in, not possible for administrative reasons, because you
couldn't log in once the failure occurred so couldn't get the output, or
because they don't work, or...? Just want to make sure I understand if
this is an administrative issue or symptomatic.
> I didn't do much investigation on those servers that time. However
> without debug.mpsafenet, servers are working fine for more than 3 weeks.
That is certainly suggestive -- I wonder if we're looking at a locking bug
in fxp0 involving serialization with the hardware. However, it's not
conclusive, I think -- when running MPSAFE, the timing is quite different
on UP as well as SMP hardware, which could trigger other existing bugs.
The big open question, I think, is whether an interrupt delivery problem
is involved.
Robert N M Watson
More information about the freebsd-current
mailing list