misc/163689: [ath] TX timeouts when sending probe/mgmt frames during scanning

Thu Dec 29 01:10:12 UTC 2011

>Number:         163689
>Category:       misc
>Synopsis:       [ath] TX timeouts when sending probe/mgmt frames during scanning
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Dec 29 01:10:12 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Adrian Chadd
>Release:        10.0-CURRENT
>Organization:
>Environment:
10.0-CURRENT, i386, etc.
>Description:
When aggregation is enabled, frames that are queued to the software queue require a call to ath_txq_sched() in order to schedule them to the hardware.

This is currently done by implication - to clarify, the only times frames are queued to the software queue is if the hardware queue is currently busy or paused. If this isn't the case, the frames are directly dispatched to the hardware. ath_txq_sched() is then called by the TXQ completion code in ath_tx_processq().

Unfortunately, during channel scanning, some frames make it to the software queue with no subsequent frames in the hardware queue. This means that ath_tx_processq() never occurs and thus ath_txq_sched() never occurs.

This results in TX timeouts, along with:

TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 1009 0000 adde
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=77, baw_tail=77 txa_start=1740, ni_txseqs=1740
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 c00a 0000 adde
ar5416PerCalibrationN: NF calibration didn't finish; delaying CCA
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=100, baw_tail=100 txa_start=1763, ni_txseqs=1763
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 600c 0000 adde
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=85, baw_tail=85 txa_start=1876, ni_txseqs=1876
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 c017 0000 adde
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=85, baw_tail=85 txa_start=1876, ni_txseqs=1876
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 e01a 0000 adde
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=85, baw_tail=85 txa_start=1876, ni_txseqs=1876
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 001e 0000 adde
ath1: ath_tx_tid_drain: node 0xc77a2000: tid 0: txq_depth=0, txq_aggr_depth=0, sched=1, paused=0, hwq_depth=0, incomp=0, baw_head=7, baw_tail=7 txa_start=1926, ni_txseqs=1926
TODS 00:03:7f:0b:62:88->00:19:e0:66:66:68(00:19:e0:66:66:68) data QoS [TID 0] 0M
 c819 3a01 0019 e066 6668 0003 7f0b 6288 0019 e066 6668 4021 0000 adde

. this only reliably occurs once aggregation is established.
>How-To-Repeat:
* Associate to an 11n enabled access point
* Pass some TX traffic to ensure that aggregation is established (wlandebug +11n first, so you get told of this.)
* Then start a ping on the station, whilst running "ifconfig wlanX scan"
* see it log these errors.

>Fix:
It's a little more complicated than it needs to be.

The above situation is purely the data frames from ping. The trouble is that it's also probe frames, and anything else that's low traffic.

I've seen it also with probe frames, with extremely busy/crowded air. This is just the easiest way to reliably trigger it.

It's fixed if ath_txq_sched() is called appropriately, but there's no appropriate, non-hackish way to call it at the present moment. It needs the txq in question and that currently isn't available in ath_start() or ath_raw_xmit(). Furthermore, right now the only place it gets called is via the taskqueue and that happens once per ath_tx_processq() call, so we don't have to worry about it running in parallel. To solve this, we may not be able to easily get away with that assumption.

>Release-Note:
>Audit-Trail:
>Unformatted: