nve timeout (and down) regression?

David G. Lawrence dg at dglawrence.com
Sat Mar 25 10:54:58 UTC 2006


> This happens w/o any "real" activity on that interface (which goes into
> an Allied Telesyn switch):
> .......
> Mar 24 19:39:54 worf kernel: nve0: device timeout (1)
> Mar 24 19:39:54 worf kernel: nve0: link state changed to DOWN
> Mar 24 19:39:55 worf kernel: nve0: link state changed to UP
> Mar 24 19:40:14 worf kernel: nve0: device timeout (1)

   The problem is the watchdog timeout itself. I've attached am email that
I sent a few months ago which describes the problem, along with a simple
patch which disables the watchdog timer.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Date: Wed, 4 Jan 2006 16:21:03 -0800
Subject: Re: nve(4) patch - please test!

> Since I sent the mail below I had to discover that the new driver
> has a problem when no cable is plugged in, at least on my Asus board.
> 
> It doesn't only run into timeouts, during some of these timeout the
> machine or at least the keyboard hangs for about a minute.
> 
> Is there anything I can do to help debug this?

   I ran into this problem recently as well and spent some time diagnosing
it. It's not that the cable isn't plugged in - rather it happens whenever
the traffic levels are low.
   The problem is that the nvidia-supplied portion of the driver is defering
the releasing of the completed transmit buffers and this occasionally
results in if_timer expiring, causing the driver watchdog routine to be
called ("device timeout"). The watchdog routine resets the card and the
nvidia-supplied code sits in a high-priority loop waiting for the card
to reset. This can take many seconds and your system will be hung until
it completes.
   I have a work-around patch for the problem that I've attached to this
email. It simply disables the watchdog. A real fix would involve accounting
for the outstanding transmit buffers differently (or perhaps not at all -
e.g. always attempt to call the nvidia-supplied code and if a queue-full
error occurs, then wait for an interrupt before trying to queue more
transmit packets).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.

Index: if_nve.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/nve/if_nve.c,v
retrieving revision 1.7.2.8
diff -c -r1.7.2.8 if_nve.c
*** if_nve.c	25 Dec 2005 21:57:03 -0000	1.7.2.8
--- if_nve.c	5 Jan 2006 00:12:45 -0000
***************
*** 943,949 ****
  			return;
  		}
  		/* Set watchdog timer. */
! 		ifp->if_timer = 8;
  
  		/* Copy packet to BPF tap */
  		BPF_MTAP(ifp, m0);
--- 943,949 ----
  			return;
  		}
  		/* Set watchdog timer. */
! 		ifp->if_timer = 0;
  
  		/* Copy packet to BPF tap */
  		BPF_MTAP(ifp, m0);


More information about the freebsd-stable mailing list