ath(4) panic + stuck beacon issue

Adrian Chadd adrian at freebsd.org
Sat Mar 5 15:21:57 UTC 2011


I'm the ath maintainer now it seems.

"stuck beacon" means a lot of things. Debugging it is going to mean better
understanding what is going on in your environment that's being handled
incorrectly. A lot of the reasons I've seen stuck beacons creep up is
because the radio environment gets periodically noisy and the driver handles
that very badly.

Step 1 is compiling your kernel with:

options ATH_DEBUG
options AH_DEBUG
options ATH_DIAGAPI

Then the next thing to do is to enable calibration debugging:

sysctl hw.ath.hal.debug=0x8

That'll spit out some debugging every 30 seconds that reports the noise
floor that the card is calibrating against.

Then you can try repeating that with the longcal interval tweaked up to say
1 second (it defaults to 30), to see if it's fluctuating:

sysctl hw.ath.longcal=1

There's some other things which can help but I need to finish porting some
diagnostic code to the ath code in -HEAD.

If you feel adventurous, you could try running the -HEAD if_ath code under
RELENG_8. I do this as a module for my testing. There's been some changes
which MAY make the AR5416 behave better.

The AR9280 unfortunately requires some more work. I'm in the process right
now of porting over some code from Linux ath9k to try and fix TX stability
issues. If it works for you then great, but it's going to be hit and miss.

The (more) technical reasons:

* The "stuck beacon" can be a variety of reasons, some of which are touched
on the madwifi site. But non-RF issues aside (ie, DMA timing, busy bus,
etc), almost all of the issues are due to a noisy environment where the card
just gets into a state where it thinks it can't transmit. Buying me a card
would be nice :) but since it's very likely environmental related, I'd have
to somehow determine and reproduce that problem. I've got some diagnostic
tools here which log things like channel TX/RX/busy status which helps
determine this stuff, but I just don't have the time right now to port all
of that over and fix up things for 11n. Sorry. :(

* Specifically, ath cards have very sensitive RX. The periodic noise floor
calibration sets the CCA threshold for "when the air is too busy to
transmit!" which is based on what the calibrated NF level is. (Ie, it
establishes what the median noise floor is over successive 30 second
samples, then uses this as the CCA threshold.) This can become quite low
(lower than -90dBm on the AR5416/AR9160, down to lower than -100dBm on the
AR9280.) But if noise then appears that's above a low-set CCA threshold, the
radio thinks it can't transmit and it resets with "stuck beacon." The
default CCA threshold is higher than what it calibrates down to over time,
so that's why you (generally) see the radio work fine for a few minutes
before resetting.

* I've discovered some AR9280 based cards have different methods of
calibrating TX power. We handle the "older" method, but not the "newer"
method. It turns out this AR9280 card I have here has the "newer" method.
Since there's no easy way for the average user to know what they have
(without inspecting the EEPROM contents to see what the manufacturer did),
it'll raise its ugly head as "it works for me!" for some, and "it doesn't
work stable for me!" for others.

* The AR5416 support in ath9k is in no way verified, stable or tested. So
it's possible that improvements I bring over from ath9k to fix AR9280 and
AR9285 issues will break previous 11n family chips (AR5416/AR9160.) I'm
trying to be mindful of this and test where I can, but it's not easy.

HTH,


Adrian

On 28 February 2011 00:01, Jeremy Chadwick <freebsd at jdc.parodius.com> wrote:

> I have a crash report to provide (for RELENG_8 dated 2010/02/12), but
> I'd like to know who's maintaining ath(4) at this point in time.
>
> I also need to discuss a commonly-reported issue with AR5416 and/or
> AR9280 cards (e.g. D-Link DWA-552 running in 802.11g mode w/ WEP)
> spitting out "stuck beacon" errors, which are what I was trying to
> resolve when the kernel crashed.  (I induced the crash, but I'm not sure
> exactly why/how).
>
> Given that the issue has existed for years now...
>
> http://www.daemonforums.org/showthread.php?t=3388
> http://forums.freebsd.org/showthread.php?t=5983
> http://forum.pfsense.org/index.php?topic=21374.0
> http://forum.pfsense.org/index.php?topic=32041.0
>
> http://www.broadbandreports.com/forum/r25070916-FreeBSD-MIPS-dev-Adrian-Chad-on-stuck-beacon-issue
> http://forums.freebsd.org/showthread.php?t=22112  (recent & thorough!)
>
> ...and "bintval 1000" does not solve it, let's work together to find a
> solution.  If you need hardware I will be more than happy to buy you
> (brand new) cards which you can keep.  If you have beta/test drivers
> and/or can provide *thorough* debugging instructions, I will be more
> than happy to do what I can.
>
> I'll also point out the Linux madwifi folks have an *entire page*
> dedicated to this problem, which is quite interesting:
>
> http://madwifi-project.org/wiki/StuckBeacon
>
> If a workaround or solution isn't plausible, what cards do people
> actually recommend that work reliably / have reliable drivers?  I was
> under the impression Atheros cards were reliable/decent compared to,
> say, Broadcom.  Is iwn(4) reliable?
>
> --
> | Jeremy Chadwick                                   jdc at parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>


More information about the freebsd-stable mailing list