dc0: watchdog timeout and nve0: device timeout
M. Warner Losh
imp at bsdimp.com
Tue Jan 31 06:41:20 PST 2006
In message: <20060131112447.GA1173 at straylight.m.ringlet.net>
Peter Pentchev <roam at ringlet.net> writes:
: On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
: > On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
: > A> After updating to STABLE today I'm getting the following message with
: > A> my dc and nve NICs every few seconds. UP, AMD64. A kernel from last
: > A> Thursday was fine.
: > A>
: > A> dc0: watchdog timeout
: > A> nve0: device timeout (4)
: >
: > Can you try to backout the code in sys/dev/pci to Thursday? If this
: > doesn't help, you probably need to do a binary search in this small
: > timeframe.
:
: I think I found the problem - the merge was not quite correct, and
: the PCI interrupt rerouting was disabled for some reason.
:
: Warner, is there a reason for hiding the "Try to re-route interrupts"
: code behind an apparently "ifdef 0" case? Well, okay, most probably
: there is a reason, since you've done it, but... it breaks my re0 card
: and it also seems to break Anish's hardware :)
I'm pretty sure that's the problem. I thought I'd specifically
checked to make sure that I didn't merge this :-(
: BTW, the commit message was not quite correct - rev. 1.302 was not
: really merged, it's included in my patch here. Also, rev. 1.305 of
: pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
: there are a couple of offset fixes that I also included in the patch
: while trying to come as close to the -CURRENT code as possible; could
: you check if they actually apply to -STABLE?
They do.
: Anyway, here's a patch that fixes it for me, although most probably
: the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if
: you want more details, I could help with debugging this - on my
: system, the re0 card definitely needs this rerouting. I've posted
: some verbose boot output with explanations at
: http://people.FreeBSD.org/~roam/pcirouting/
: The patch itself is also there in case it gets munged by the mail
: swervers along the way.
:
: Index: src/sys/dev/pci/pci.c
: ===================================================================
: RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
: retrieving revision 1.292.2.6
: diff -u -r1.292.2.6 pci.c
: --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -0000 1.292.2.6
: +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -0000
: @@ -428,7 +428,7 @@
: ptrptr = PCIR_CAP_PTR;
: break;
: case 2:
: - ptrptr = 0x14;
: + ptrptr = PCIR_CAP_PTR_2;
: break;
: default:
: return; /* no extended capabilities support */
: @@ -447,10 +447,10 @@
: }
: /* Find the next entry */
: ptr = nextptr;
: - nextptr = REG(ptr + 1, 1);
: + nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
:
: /* Process this entry */
: - switch (REG(ptr, 1)) {
: + switch (REG(ptr + PCICAP_ID, 1)) {
: case PCIY_PMG: /* PCI power management */
: if (cfg->pp.pp_cap == 0) {
: cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
: @@ -1040,7 +1040,8 @@
: }
:
: if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) {
: -#ifdef __PCI_REROUTE_INTERRUPT
: +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
: + defined(__arm__) || defined(__alpha__)
: /*
: * Try to re-route interrupts. Sometimes the BIOS or
: * firmware may leave bogus values in these registers.
:
: Hope this helps!
I'm pretty sure that the REROUTE thing is the only one. That
shouldn't have been committed, and I thought I'd checked it
specifically before the commit, but I just checked what I committed
and it slipped by. This fits with the symptoms that I saw my server
last night (the only differences between a stable boot and an older
stable boot was IRQs).
The last part of this patch seems to fix things for me.
Warner
More information about the freebsd-stable
mailing list