Possible arge driver bug - interrupt storm on int2

Gergely Kiss mail.gery at gmail.com
Tue Jul 31 23:21:18 UTC 2018


On 31 July 2018 at 15:45, Gergely Kiss <mail.gery at gmail.com> wrote:
>
> Hi,
>
> I'm working on improving the freebsd-wifi-build project to have an out-of-the-box gateway solution built on FreeBSD for SOHO routers. For more information, please see below link:
>
> https://github.com/kissg1988/freebsd-wifi-build
>
> I've created a build script that should simplify creating a build environment from scratch:
>
> https://github.com/kissg1988/freebsd-wifi-build/blob/buildscript/scripts/build.sh
>
> The goal is to make the distribution easy to build and use so it can be flashed just as simple as OpenWrt or DD-WRT.
>
> The device I'm working with is a TP-Link WN-1043ND v1.8 which has a Realtek RTL8366RB switch and is built around the AR9132 SoC. The board runs 11.2-RELEASE currently.
>
> The ethernet controller works perfectly with a static or DHCP provided IP address, however it seems something is wrong with the way the driver handles the TX queue in case 802.1Q tagging is in use and the system tries sending PPP frames over the tagged interface (eg. vlan2).
>
> Once ppp starts establishing the connection, packets to be sent are queued but not being transmitted by the controller and an interrupt storm is generated with only the TX_UNDERRUN flag set (without the TX_PKT_SENT flag).
>
> This might be related to an issue seen a few years back:
>
> https://lists.freebsd.org/pipermail/freebsd-mips/2015-October/004137.html
>
> As PPPoE has an 8-byte overhead and VLAN tagging needs an additional 4-byte field, I have lowered the MTU on arge0 and vlan2 to 1488 bytes but it didn't make any difference, the interrupt storm still happens and it's always reproducible. Once it starts, the only way to stop the storm is to bring down arge0.
>
> If I destroy the vlan interface, and put the WAN port to an access VLAN with no tagging, the PPPoE connection works fine.
>
> I'll make some more tests later today with -HEAD to see if it makes any difference.
>
> Any help or ideas would be much appreciated in the meantime.
>
> Thanks,
> Gergely
>

I have made some tests with -CURRENT and could reproduce the isssue
exactly the same way as with 11.2-RELEASE.

The version used for my tests is:

FreeBSD 12.0-CURRENT #0 409735e6b1a(master)-dirty: Tue Jul 31 22:12:37 CEST 2018

After setting sysctl dev.arge.0.debug=0x16 (ARGE_DBG_INTR |
ARGE_DBG_TX | ARGE_DBG_ERR), this is what I see on the console a few
seconds after starting the PPPoE connection:

interrupt storm detected on "int2"; throttling interrupt source
arge0: int mask(filter) =
db<RX_BUS_ERROR,RX_OVERFLOW,RX_PKT_RCVD,TX_BUS_ERROR,TX_UNDERRUN,TX_PKT_SENT>
arge0: status(filter) = 2<TX_UNDERRUN>
arge0: int status(intr) = 2<TX_UNDERRUN>
arge0: arge_intr: TX underrun; tx_cnt=18

tx_cnt keeps increasing every few seconds and never resets while the
console is flooded with the lines above. The only way to stop the
storm is to bring down arge0 (or power cycling the device, of course).

I have noticed that setting the CPU port untagged for VLAN2 (ie. the
WAN interface) solves the issue, so it's possible to have a workaround
where VLAN1 is tagged on the CPU port (this works fine with "standard"
Ethernet frames) and VLAN2 is untagged.

However, I think this would not be an elegant way to handle this issue
so I think about it as a last resort only.

Please someone assist me with this as I'm not a driver developer, just
an enthusiastic guy trying to improve what you guys (especially
Adrian) have created, so far.

Thanks,
Gergely


More information about the freebsd-mips mailing list