dc0: watchdog timeout and nve0: device timeout

M. Warner Losh imp at bsdimp.com
Tue Jan 31 06:41:20 PST 2006


In message: <20060131112447.GA1173 at straylight.m.ringlet.net>
            Peter Pentchev <roam at ringlet.net> writes:
: On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote:
: > On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote:
: > A> After updating to STABLE today I'm getting the following message with 
: > A> my dc and nve NICs every few seconds.  UP, AMD64.  A kernel from last 
: > A> Thursday was fine.
: > A> 
: > A> dc0: watchdog timeout
: > A> nve0: device timeout (4)
: > 
: > Can you try to backout the code in sys/dev/pci to Thursday? If this
: > doesn't help, you probably need to do a binary search in this small
: > timeframe.
: 
: I think I found the problem - the merge was not quite correct, and
: the PCI interrupt rerouting was disabled for some reason.
: 
: Warner, is there a reason for hiding the "Try to re-route interrupts"
: code behind an apparently "ifdef 0" case?  Well, okay, most probably
: there is a reason, since you've done it, but... it breaks my re0 card
: and it also seems to break Anish's hardware :)

I'm pretty sure that's the problem.  I thought I'd specifically
checked to make sure that I didn't merge this :-(

: BTW, the commit message was not quite correct - rev. 1.302 was not
: really merged, it's included in my patch here.  Also, rev. 1.305 of
: pci.c seems to have more than just adding the PCI_FIND_EXTCAP method -
: there are a couple of offset fixes that I also included in the patch
: while trying to come as close to the -CURRENT code as possible; could
: you check if they actually apply to -STABLE?

They do.

: Anyway, here's a patch that fixes it for me, although most probably
: the __PCI_REROUTE_INTERRUPT chunk should be sufficient.  Warner, if
: you want more details, I could help with debugging this - on my
: system, the re0 card definitely needs this rerouting.  I've posted
: some verbose boot output with explanations at
: http://people.FreeBSD.org/~roam/pcirouting/
: The patch itself is also there in case it gets munged by the mail
: swervers along the way.
: 
: Index: src/sys/dev/pci/pci.c
: ===================================================================
: RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v
: retrieving revision 1.292.2.6
: diff -u -r1.292.2.6 pci.c
: --- src/sys/dev/pci/pci.c	30 Jan 2006 18:42:10 -0000	1.292.2.6
: +++ src/sys/dev/pci/pci.c	31 Jan 2006 10:57:32 -0000
: @@ -428,7 +428,7 @@
:  		ptrptr = PCIR_CAP_PTR;
:  		break;
:  	case 2:
: -		ptrptr = 0x14;
: +		ptrptr = PCIR_CAP_PTR_2;
:  		break;
:  	default:
:  		return;		/* no extended capabilities support */
: @@ -447,10 +447,10 @@
:  		}
:  		/* Find the next entry */
:  		ptr = nextptr;
: -		nextptr = REG(ptr + 1, 1);
: +		nextptr = REG(ptr + PCICAP_NEXTPTR, 1);
:  
:  		/* Process this entry */
: -		switch (REG(ptr, 1)) {
: +		switch (REG(ptr + PCICAP_ID, 1)) {
:  		case PCIY_PMG:		/* PCI power management */
:  			if (cfg->pp.pp_cap == 0) {
:  				cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2);
: @@ -1040,7 +1040,8 @@
:  	}
:  
:  	if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) {
: -#ifdef __PCI_REROUTE_INTERRUPT
: +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \
: +		defined(__arm__) || defined(__alpha__)
:  		/*
:  		 * Try to re-route interrupts. Sometimes the BIOS or
:  		 * firmware may leave bogus values in these registers.
: 
: Hope this helps!

I'm pretty sure that the REROUTE thing is the only one.  That
shouldn't have been committed, and I thought I'd checked it
specifically before the commit, but I just checked what I committed
and it slipped by.  This fits with the symptoms that I saw my server
last night (the only differences between a stable boot and an older
stable boot was IRQs).

The last part of this patch seems to fix things for me.

Warner


More information about the freebsd-stable mailing list