ste(4) NIC's RX ring head may get ahead of the driver [PATCH]
ambrisko at ambrisko.com
Tue Mar 30 08:23:05 PST 2004
Ruslan Ermilov writes:
| To make the long story short, under a heavy RX load, the ste(4) NIC's
| RX ring head may get ahead of what driver thinks, bringing all sort
| of havoc like stuck traffic, disordered packets, etc. The NIC never
| gets out of this state, and the only workaround is to reset the chip,
| and so we did for some time (by adding the IFF_LINK2 handler to call
| the driver's watchdog function).
We never experienced this in our testing or in production. We were
using the 4 port card. Maybe this bug is on another variant of the
| We've adopted the approach used by dc(4) and xl(4), but instead of
| seeing if we need to re-synchronize the head _after_ receiving (like
| dc(4) and xl(4) drivers do), we do it at the beginning of ste_rxeof().
| As statistics shows, the number of resyncs needed is smaller by a
| factor of 3 or more in this case, because often the RxDMAComplete
| interrupt is generated when RX ring is completely empty(!), and as
| NIC continues to do DMA and fill the RX ring while we're still
| servicing the RxDMAComplete interrupt, we did more resyncs than was
| actually necessary.
| Also, we were able to further reduce the number of resyncs by setting
| the RxDMAPollPeriod to a higher value. 320ns looked like an overkill
| here, and I'm not sure why you have chosen it in the first place,
| when adding polling support for RX in the driver. Also, we believe
| that this setting may be responsible for what you referred to as:
I'm not sure.
| > This card still has seemingly unfixable issues under heavy RX load in
| > which the card takes over the PCI bus.
| in the commit log for revision 1.33 of if_ste.c.
| Attached is the patch (for RELENG_4) we're currently using, and are
| quite happy with. If anyone is using ste(4) NICs and is experiencing
| similar problems, I'd be glad to hear the reports about this patch.
Sounds good. However it won't fix the core problem that I reported.
D-Link's solution was to EOL 4-port card because of this problem.
You can see it in their Linux and Windows drivers. The easiest
way to see it is to send traffic into all 4 ports of the 4 port card.
You will see only one port have activity then it switch to another.
It will not be multiplexing traffic. Another thing I found that would
lead to a panic was that if you reset the chip while it is sending
traffic into the card the reset will return but the card still takes
RX packets and DMA's them into memory. Since it we have released
the memory for the card it would then splat bits over something else.
It was a while before I figure out this cause of panics :-(
I don't see how your change will fix that.
I no longer have access to the HW or the test environment I used
since I've changed jobs. I have no objection to your change and it
sounds good. I don't think it will solve the problem I saw.
While you are in this driver can you convert it to Mike Silby's generic
de-frager? To test it do some like:
dd if=/kernel bs=1 | ssh <something> "cat > /tmp/kernel"
I original "stole" the code to do this from fxp(4) which was before
Mike did the generic de-frager.
More information about the freebsd-net