misc/167113: [ath] AR5210: "stuck" TX seems to be occuring,
without watchdog timeout firing
adrian at FreeBSD.org
Fri Apr 20 00:30:11 UTC 2012
>Synopsis: [ath] AR5210: "stuck" TX seems to be occuring, without watchdog timeout firing
>Arrival-Date: Fri Apr 20 00:30:10 UTC 2012
>Originator: Adrian Chad
>Release: 9.0-RELEASE i386, with -HEAD net80211/ath
When using an AR5210 NIC and with -bgscan disabled, I've noticed that TX will occasionally hang.
A 'scan' (which resets the NIC) will make things work again.
A watchdog timeout isn't occuring, so the watchdog is being tickled somehow. However, the data TXQ shows 2-3 frames actually in the queue, as well as a number of frames being buffered in the software queue.
The relevant dmesg output:
HW TXQ 0: axq_depth=2, axq_aggr_depth=0
HW TXQ 1: axq_depth=0, axq_aggr_depth=0
Total TX buffers: 77; Total TX buffers busy: 0
(here, ifconfig wlan1 scan)
wlan1: [00:24:6c:04:ed:39] sta power save mode on
ar5210: dma receive failed to stop in 10ms
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: addbaw=0, dobaw=0, seqno_assign=0, seqno_required=0, seqno=-1, retry=0
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid txq_depth=51 hwq_depth=0
ath1: ath_tx_tid_drain: node 0xc78c6000: bf=0xc787b570: tid 16: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=0, baw_tail=0 txa_start=-1, ni_txseqs=45773
TODS 00:30:ab:17:81:47->00:1f:6c:9a:3f:1b(00:24:6c:04:ed:39) data 0M
0801 0000 0024 6c04 ed39 0030 ab17 8147 001f 6c9a 3f1b a029 aaaa 0300 0000 0800 4510 0034 fe95 4000 4006 a3ea c0a8 643c cb38 a816 28bf 0016 eae0 2e41 38e2 060a 8010 0401 383a 0000 0101 080a 158d 61be 067f a3a0
* Bring the AR5210 'up'
* disable bgscan (ifconfig wlanX -bgscan)
* Do some small amount of traffic (eg web, ssh) and see it occasionally hang
* check the output of sysctl dev.ath.X.txagg=1
I'm not sure. I don't know why frames are going into the software queue here - no aggregation has been negotiated, so in theory everything _should_ be being hardware queued.
However, ath_tx_swq() is incorrectly checking the hardware queue depth against the sc_hwq_limit for non-aggregate traffic, and it's being software queued.
So I -think- in this case, non-aggregate traffic is still being software queued _and_ only two frames are ever being queued to the hardware. That's likely very sub-optimal, but it's making this particular bug show its ugly head.
What I need to check:
* Are we somehow missing TX interrupts? (eg RAC style bugs)
* There are frames in the hardware TXQ, so are they actually completed? I should turn on reset debugging (sysctl dev.ath.1.debug=0x20) and see what the descriptor dump looks like. If they're completed, a TX interrupt should've occured.
* .. am I also getting TXEOL from the AR5210? That's how the TX interrupt mitigation technique is supposed to work.
More information about the freebsd-bugs