Ongoing problems with the "ath" interface - is any relief in sight??

Sam Leffler sam at errno.com
Sat Jul 29 15:40:44 UTC 2006


Ross Finlayson wrote:
> For several months now, the "ath" interface has been spazzing out at
> random times (in systems that are acting as wireless base stations). For
> example:
> 
> Jul 28 21:44:47 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
> Jul 28 21:44:47 ns kernel: ath0: ath_reset: unable to reset hardware;
> hal status 3
> Jul 28 21:45:08 ns kernel: ath0: device timeout
> Jul 28 21:45:08 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
> Jul 28 21:45:08 ns kernel: ath0: ath_reset: unable to reset hardware;
> hal status 3
> [and then the interface stops working]
> 
> 
> %cat /etc/motd
> FreeBSD 6.1-STABLE (GENERIC) #6: Thu Jul 27 20:55:43 PDT 2006
> 
> The error isn't always the same, however.  Often it is
>     ath0: device timeout
> or
>     ath0: discard frame w/o packet header
> or even
>     arp: unknown hardware address format (0x4500)
> 
> In each case, however, the "ath" interface stops working Immediately
> after the error report, so I don't believe that the latter two error
> reports are legitimate.  I'm wondering it perhaps there's a memory smash
> somewhere that's corrupting some driver data structures (thereby causing
> bogus error reports in addition to stopping the interface from working)?
> 
> The last time I asked about this, someone speculated that 'power save
> mode' was the culprit.  Unfortunately, the system is running in a coffee
> shop that provides public WiFi, so it's not possible to stop clients
> from using power save mode.
> 
> On my system, these errors are often happening several times a day. Has
> anyone else run into frequent problems like this, and is anyone looking
> into a solution?

"stuck beacon" means the tx dma of the beacon frame failed to complete
in a full beacon interval.  Diagnosing such a problem requires
understanding why dma failed to complete.  This usually involves
checking the dma descriptor for clues and/or looking at other
h/w-related state.  If you have a "memory smash" then you will see it in
the descriptor contents--but I doubt it.  In my experience this problem
is usually caused by feeding bogus data to the dma engine that causes it
to lockup but the problem in general is very complicated and not
something I can diagnose remotely.

	Sam


More information about the freebsd-mobile mailing list