nve locking fixes round 2
Matthew Dillon
dillon at apollo.backplane.com
Thu Nov 24 23:29:41 GMT 2005
:Ok, now that the first set of locking overhaul is in the tree, can folks with
:working nve(4) adapters test the patch referenced below and make sure there
:are no regressions. Having the IFF_UP fiddling turned off may or may not
:help folks getting the TX timeouts as well, btw, so if people are feeling
:brave they can try this patch as well. Note it is only applicable to recent
:current.
:
:http://www.FreeBSD.org/~jhb/patches/nve_locking.patch
:
:--
:John Baldwin <jhb at FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
:"Power Users Use the Power to Serve" = http://www.FreeBSD.org
The reason I set sc->pending_txs to 0 in DFly after the reinit is
because when a watchdog timeout occurs and you reset the device,
*ALL* mbufs still sitting in the transmit ring are lost. They will
never be acknowledged, ever. So pending_txs will never drop back to 0 on
its own. This is what led to continuous watchdog timeout reports
when, in fact, only one timeout actually occured.
The FreeBSD code does set pending_txs to 0 in nve_stop(). I'm not
sure this is correct, however, unless the pfnStop() ABI call cleans
out pending mbufs in the transmit ring (which seems unlikely). The
count would wind up going negative.
Another problem that neither of us has dealt with yet is recovery of
dead transmit mbufs. Right now that only occurs in nve_ospackettx(),
but nve_ospackettx() is only called by the Nvidia code during normal
operation. ABI calls to e.g. reset the Nvidia device will *NOT*
clean out the transmit ring and call nve_ospackettx(), so we lose track
of all the mbufs that were sitting in there at the time of a reinit.
But, of course, the biggest problem is simply the fact that the NVidia
ABI library seems to be rather broken. On my nForce4-based boxes the
DFly driver can recover from numerous watchdog timeouts (and they occur
quite often, even when the network load is virtually nil), but after an
hour or two of testing at GiGE speeds the hardware itself stops working
entirely, to the point where I have to physically unplug and replug
the power cord for the machine for the hardware to start working again.
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-current
mailing list