call for bge(4) testers
glebius at FreeBSD.org
Wed Aug 23 10:04:36 UTC 2006
On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote:
P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote:
P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote:
P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think
P> > P> bge(4) may suffer from the same issue. So if you have seen occasional
P> > P> watchdog timeout errors on bge(4) please give the attached patch a try.
P> > P> The patch does fix false watchdog timeout error only.
P> > P> Typical pheonoma for false watchdog timeout error are
P> > P> o polling(4) fix the issue
P> > P> o random watchdog error
P> > P>
P> > P> If my patch fix the issue you could see the following messages.
P> > P> "missing Tx completion interrupt!" or "link lost -- resetting"
P> > I still think that this fix is incorrect. It is just a more gentle
P> > recovery from a fake watchdog timeout.
P> Its sole purpose is to reinitialize hardware for real watchdog
P> timeouts. It's not fix for general watchdog timeouts. As I said other
P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares
P> with Tx interrupt moderation capability could be normal thing. So I
P> just want to know bge(4) also has the same feature(bug).
According to several emails about em(4) fake watchdog timeouts, the
problem can be fixed by setting debug.mpsafenet=0. This makes me think
that the problem isn't caused by TX interrupt moderation, but some race
in the kernel. Really, if_slowtimo() doesn't acquire driver lock before
checking and modifying the if_timer field.
Afaik, NIC drivers that can do interrupt moderation should set a timer
to a sane value, based on interrupt moderation settings, so that the
watchdog won't be ever called fakely.
P> > The more I think, the more I doubt that we really need the
P> > watchdog infrastructure that comes from old days.
P> Would you give other way to recover from Tx stuck condition without
P> using watchdog?
May be driver should take care of that theirselves, why not? At least
the callout routine will have access to the driver mutex, contrary to
Totus tuus, Glebius.
More information about the freebsd-current