misc/165475: [ath] operational mode change doesn't poke the
underlying rate control module hard enough
adrian at FreeBSD.org
Sat Feb 25 19:50:12 UTC 2012
>Synopsis: [ath] operational mode change doesn't poke the underlying rate control module hard enough
>Arrival-Date: Sat Feb 25 19:50:11 UTC 2012
>Originator: Adrian Chadd
>Release: 9.0-RELEASE, w/ -HEAD net80211/ath
This reared its ugly head when testing with an AR5211 (11b/11a, no 11g.)
* the operational mode change has occured (sc->sc_currates is pointing to the 11b table);
* ath_sample_node->ratemask is 0xff for some reason - likely indicating it was assembled from the 11a rate able (which in ath_hal/ar5211/ar5211_phy.c has 8 11a rates in it);
* so ath_rate_findrate() thinks best_rix is fine and the current rate table mapping is fine.
This is likely very similar to other issues with rate control in ath being slightly weird after an operational mode change, if the NIC hasn't transitioned back into the original operating mode. The rate control code isn't informed of this (it only gets told of association/reassociation, and ath_rate_sample is only updating the rate table on _new_ associations) so it doesn't realise it has to rethink its current rate table setup.
* net80211/ath and kernel built with full debugging, assert, witness, etc
* associated to an 11a AP (so it has the 11a OFDM table)
* running iperf
* the session hangs for some reason, I'm not quite sure yet
* .. then the bgscan code kicks in and starts scanning
* .. and for some reason, the NIC is in 11b mode now, and tries TX'ing
* But the "best rix" in ath_rate_findrate (in ath_rate_sample) is referencing an 11a rate, not an 11b rate - ie, rix > the current greatest rix in the config.
* .. so things panic.
I'm not yet sure.
Because of background scanning, it's entirely possible the NIC will spend a non-zero amount of time off channel, TX'ing things which SHOULD have fixed rates.
The ath_rate module code isn't currently informed about channel changes, as the channel change doesn't inform all associated nodes of this fact.
Any rate control lookups during off-channel times will cause things to be confused.
I should first check whether this crash occured with the NIC being in off-channel mode. If so, it shouldn't have tried TXing a data frame at this point. No, i just checked - ni->ni_vap->iv_flags is 0x430c4010 - and 0x80 is IEEE80211_F_SCAN; 0x100 is IEEE80211_F_ASCAN.
So first let's see if _why_ the NIC is in 11b mode can be made obvious. Then, once that's done, figure out why the transition didn't trigger a rate control update.
More information about the freebsd-bugs