cvs commit: src/sys/pci if_xl.c if_xlreg.h

Bruce Evans bde at zeta.org.au
Tue Dec 5 23:01:53 PST 2006


On Wed, 6 Dec 2006, Marius Strobl wrote:

> marius      2006-12-06 02:18:41 UTC
>
>  FreeBSD src repository
>
>  Modified files:
>    sys/pci              if_xl.c if_xlreg.h
>  Log:
>  - Use the xl_stats_update() callout instead of if_slowtimo() for
>    driving xl_watchdog() in order to avoid races accessing if_timer.

It's a shame to force all NIC drivers to manage the timeout for this.
Most have a timeout for other purposes so I couldn't see how to save
much code using a callback, but a callback would be cleaner.  (To avoid
the race, just move the decrement of the count to drivers.)

>    While at it relax the watchdog a bit by reloading it in xl_txeof()/
>    xl_txeof_90xB() if there are still packets enqueued.

Er, xl_txeof_90xB() was one of the 20-30% of txeof() routines that
handled this correctly.  The timeout shouldn't be touched in xx_txeof()
except to clear it when all descriptors have been handled.

Reloading it without even checking that at least one descriptor has
been handled is almost as bad as clearing it without checking, since
xx_txeof() may be called when there is nothing for it to do.  This is
most broken in the DEVICE_POLLING case.  Then xx_txeof() will be called
every few mS, much more often than the counter is decrememented, so
unconditionally reloading it ensures that it never works.

Reloading it only if some progress has been made is mainly a small
pessimization.  If the initial timeout is not long enough for all the
descriptors, then something is wrong and it is probably better to let
the watchdog try to fix it than to risk packets trickling out at a
rate of 1 every (<timeout> - epsilon) seconds, especially since it
takes more code to implement the trickling.  However, the trickling
case is very unlikely to occur and most or all xx_start() routines
don't worry about it -- they just reload the full timeout and don't
waste time calculating a timeout based on the number of new and old
descriptors.

xl_txeof_90xB() now reloads the timeout without checking anything
except that there are still some unhandled descriptors.  ISTR that
this bug has been fixed in a few drivers, mainly ones that support
DEVICE_POLLING.  Grepping for "0 : 5" shows the bug in the following
drivers: dc, pcn, sis, xl.  All of these except pcn support
DEVICE_POLLING.  The bug may be spelled in other ways.  Non-bugs
handling the timer are spelled too differently too.

xl_txeof() spells this too differently:
- attached to it too closely (but applying to both xl txeof() routines)
   there is a comment that "A frame was downloaded".  This may have been
   correct once upon a time, but now both routines are called without
   checking that a frame has been downloaded for polling and also from
   xl_start_locked().
- it has a banal comment "Clear the timeout (sic) timer." that is now
   further from echoing the code.

The spelling differences are so large that it is unclear if it has the
bug:
- it clears the timeout correctly (only clear if no more descriptors).
- reloading of the timeout is dependent on hardware stuff that I don't
   really understand.  Apparently the device has a hardare register
   which we can poll to see whether the device is stalled.  I think the
   hardware shouldn't be trusted if it has stalled, and anyway, the
   hardware stuff shouldn't be done here since it usually just wastes
   time (especially in the DEVICE_POLLING case).

Bruce


More information about the cvs-src mailing list