call for bge(4) testers
Pyun YongHyeon
pyunyh at gmail.com
Wed Aug 23 10:54:47 UTC 2006
On Wed, Aug 23, 2006 at 02:04:20PM +0400, Gleb Smirnoff wrote:
> On Wed, Aug 23, 2006 at 06:55:04PM +0900, Pyun YongHyeon wrote:
> P> On Wed, Aug 23, 2006 at 01:37:41PM +0400, Gleb Smirnoff wrote:
> P> > On Tue, Aug 22, 2006 at 01:20:23PM +0900, Pyun YongHyeon wrote:
> P> > P> After fixing em(4) watchdog bug, I looked over bge(4) and I think
> P> > P> bge(4) may suffer from the same issue. So if you have seen occasional
> P> > P> watchdog timeout errors on bge(4) please give the attached patch a try.
> P> > P> The patch does fix false watchdog timeout error only.
> P> > P> Typical pheonoma for false watchdog timeout error are
> P> > P> o polling(4) fix the issue
> P> > P> o random watchdog error
> P> > P>
> P> > P> If my patch fix the issue you could see the following messages.
> P> > P> "missing Tx completion interrupt!" or "link lost -- resetting"
> P> >
> P> > I still think that this fix is incorrect. It is just a more gentle
> P> > recovery from a fake watchdog timeout.
> P>
> P> Its sole purpose is to reinitialize hardware for real watchdog
> P> timeouts. It's not fix for general watchdog timeouts. As I said other
> P> mails, the fake watchdog timeout(losing Tx interrupts) for hardwares
> P> with Tx interrupt moderation capability could be normal thing. So I
> P> just want to know bge(4) also has the same feature(bug).
>
> According to several emails about em(4) fake watchdog timeouts, the
> problem can be fixed by setting debug.mpsafenet=0. This makes me think
> that the problem isn't caused by TX interrupt moderation, but some race
> in the kernel. Really, if_slowtimo() doesn't acquire driver lock before
> checking and modifying the if_timer field.
>
Hmm... I didn't say the problem was caused by TX interrupt moderation.
I can't sure but I'm under the impression it has *two* different issues.
If you think fake watchdog timeout fix is not adequate one please
let me know. I'll backout the change if you want.
> Afaik, NIC drivers that can do interrupt moderation should set a timer
> to a sane value, based on interrupt moderation settings, so that the
> watchdog won't be ever called fakely.
>
Yes. Normally it should. But I saw the issues on Marvell Yukon too.
> P> > The more I think, the more I doubt that we really need the
> P> > watchdog infrastructure that comes from old days.
> P>
> P> Would you give other way to recover from Tx stuck condition without
> P> using watchdog?
>
> May be driver should take care of that theirselves, why not? At least
> the callout routine will have access to the driver mutex, contrary to
> if_slowtimo().
>
--
Regards,
Pyun YongHyeon
More information about the freebsd-current
mailing list