kern/188576: [ath] traffic hangs in station mode when downgrading from AMPDU TX or reassociating
Adrian Chadd
adrian at freebsd.org
Sun Apr 13 23:30:00 UTC 2014
>Number: 188576
>Category: kern
>Synopsis: [ath] traffic hangs in station mode when downgrading from AMPDU TX or reassociating
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sun Apr 13 23:30:00 UTC 2014
>Closed-Date:
>Last-Modified:
>Originator: Adrian Chadd
>Release: HEAD
>Organization:
>Environment:
FreeBSD lucy-11i386 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r263418M: Tue Apr 1 11:33:21 PDT 2014 adrian at lucy-11i386:/usr/home/adrian/work/freebsd/head/obj/usr/home/adrian/work/freebsd/head/src/sys/LUCY_11_i386 i386
>Description:
Whenever an ath(4) 11n station reassociates or downgrades from aggregation to no aggregation, there's a chance that it'll hang and refuse to queue more frames.
The session needs to be fully torn down (eg ifconfig wlanX down) for things to go back to normal.
>How-To-Repeat:
>Fix:
I actually have debugged this a little already.
So the problem seems to be that there's more than one entry point into ath_tx_tid_cleanup(). It's likely a couple of calls into the reassociation path or one into reassociate and one into aggregation teardown. I'll go figure that bit out soon.
But what it leads to is thus:
* the caller causes ath_tx_tid_pause();
* ath_tx_tid_cleanup() is called;
* the first time this happens it sees there's 1 or more frames to cleanup, so it sets tid->cleanup_inprogress;
* the caller then checks if that's set to 1 - if so, it assumes that it should wait until the cleanup is finished;
* otherwise it calls ath_tx_tid_resume().
If tid->cleanup_inprogress is set to 1 then the normal TX completion path will eventually call ath_tx_comp_cleanup_unaggr() or ath_tx_comp_cleanup_aggr() which will clear the flag and resume the TID.
If a second path through ath_tx_tid_cleanup() occurs, then:
* the caller pauses;
* ath_tx_tid_cleanup() is called;
* tid->cleanup_inprogress is set to 1, but there's no code to check whether this call actually set it or not - so it doesn't call ath_tx_tid_resume().
So once the frames complete and ath_tx_tid_resume() is called, there's still a pending paused reference and thus traffic never continues flowing.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list