cvs commit: src/sys/dev/bge if_bge.c

Sun Dec 24 02:11:25 PST 2006

On Sun, 24 Dec 2006, Scott Long wrote:

> Bruce Evans wrote:
>> On Sun, 24 Dec 2006, Oleg Bulyzhin wrote:

>>> it's quite unusal) and it is not lock related:
>>> 1) bge_start_locked() & bge_encap fills tx ring.
>>> 2) during next 5 seconds we do not have packets for transmit (i.e. no
>>>   bge_start_locked() calls --> no bge_timer refreshing)
>>> 3) for any reason (don't ask me how can this happen), chip was unable to
>>>   send whole tx ring (only part of it).
>>> 4) here we have false watchdog - chip is not wedged but bge_watchdog would
>>>   reset it.
>> 
>> Then it is a true watchdog IMO.  Something is very wrong if you can't send
>> 512 packets in 5 seconds (or even 1 packet in 5/512 seconds).
>
> No it's not wrong.  You can be under heavy load and be constantly preempted. 
> Or you could be getting a fed a steady stream of traffic
> and have a driver that is smart enough to clean the TX-complete ring
> in if_start if it runs out of TX slots.  These effects have been
> observed in at least the if_em driver.

Come on, we want to handle 100's of kpps.  Something is very wrong if
we cannot handle 100 pps on one interface in one direction.  Other
interfaces and directions shouldn't be allowed to dominate so much
that anthing gets starved.

I would agree that a 5 second timeout is too short for 1 Mbps ethernet,
iff 1 Mbps NICs had rx rings with 512 entries :-), since 1 Mbps can
only handle 82 pps with 1518-byte packets.  The timeout was 2 seconds
for 10 Mbps ethernet in most drivers in FreeBSD-1 in 1994.

Drivers could easily have a bug like cleaning the tx ring without
adjusting the watchdog timer.

Did you see the effects for em under UP?  Under SMP, the race decrementing
the timer made it hard to tell what caused watchdog timeouts.

Bruce