Problems with ath at kern.hz=100

Sam Leffler sam at freebsd.org
Tue Feb 10 11:53:05 PST 2009


Bengt Ahlgren wrote:
> Sam Leffler <sam at freebsd.org> writes:
>
>   
>> Ian Smith wrote:
>>     
>>> On Mon, 9 Feb 2009, Bengt Ahlgren wrote:
>>>       
>
> [...]
>
>   
>>> Sam has talked of a problem with ath (and maybe other) interrupt
>>> rates overwhelming slow CPUs to the extent that userland (ie powerd)
>>> doesn't get a look-in for a while, as I understand it.  A fast CPU
>>> clocked back to less than say 300MHz IS a slow CPU, until powerd
>>> gets to crank it up.
>>>   
>>>       
>> If powerd starvation were happening you'd see livelock not slow
>> traffic.  The problem was that MIB overflow interrupts from the card
>> were pounding the host but since the cpu clock was lowered they were
>> not serviced fast enough.
>>
>> FWIW I believe the MIB interrupt storm issue is fixed in head.  I
>> posted a patch to stable@ to enable backmerge of the necessary code
>> but received ~zero feedback in >1 month so RELENG_7 will continue to
>> rot w/ old code.
>>     
>
> The system was not livelocked - it was fully working except for ath
> not sending out packets.  I can however try the patch if you think it
> is worthwhile.
>
> [...]
>
>   
>>>  >  > Any idea on what the problem can be, or how do debug this
>>> further?
>>>
>>>   
>>>       
>> You can use tcpdump to monitor traffic at multiple levels in the hierarchy:
>>
>> tcpdump -i ath0 -n   # tap at 802.3 level
>> tcpdump -i ath0 -n -y IEEE802_11   # tap at 802.11 level
>> tcpdump -i ath0 -n -y IEEE802_11_RADIO   # tap at driver level
>>
>> and/or you can check statistics using athstats and wlanstats or you
>> can turn on debugging msgs in net80211 or the ath driver using
>> wlandebug and athdebug.  Some of these may require you to build your
>> kernel w/ debug options (e.g. ATH_DEBUG).
>>     
>
> Does the following mean anything to you?  I did athdebug
> watchdog+xmit+calibrate+xmit_proc - more or less random choice, since
> I don't know what it really means.  I see a lot of those
> ath_calibrate:s with low "x" in "next +x" when it is not sending.
> When it is operating normally, it mostly says "next +30" with "tries
> 0".  But there is no strict correlation, it can sometimes send packets
> even when it says "!iqCalDone".
>   

Typically you want to work from the top down; turning on athdebug gives 
you info at the lowest parts of the system which rarely is meaningful 
unless you've already narrowed down a problem.

> ath_tx_dmasetup: m 0xc37d4a00 len 178
> TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
>  0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 f007 aaaa 0300 0000 0800 4500 0092 0128 4000 4006 3426 c10a 42fd c10a 4006 f506 008f 3433 0198 3326 9ce6 5018 ffff c1f2 0000 1703 0100 200d 2e0f 8857 2590 18d0 745b ce33 3874 4089 c851 5aab e5de ef81 8a43 ab98 ad4b 8b17 0301 0040 6b34 58a9 a0e4 b41b 9868 fa90 c69c 95a3 dc6c 99f9 c942 1bf4 219d 5b5f 5711 174d a810 c6d8 2dff 0c88 bb42 c566 6eae 8b7c acff bba3 847b 09d0 a77a e597 fc87 a6c8
> ath_tx_handoff: 0: 00000000 014cea3c c12200b6 0000a0b2 0004006c 01e0000c
> ath_tx_handoff: link[1](0xdecd8900)=0x14b1bcc0 (0xdecd8cc0) depth 2
> ath_tx_processq: tx queue 1 head 0x14b1bcc0 link 0xdecd8cc0
> ath_tx_dmasetup: m 0xc37d7800 len 162
> TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
>  0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 0008 aaaa 0300 0000 0800 4500 0082 0129 4000 4006 3435 c10a 42fd c10a 4006 f506 008f 3433 0202 3326 9d4b 5018 ffff 24fa 0000 1703 0100 206c 06b7 7a5e 1427 c691 772f b923 07be 4c94 4fa0 80e8 4130 5192 e751 970e 8092 c617 0301 0030 0124 d75d c26b 0a82 13ce 8044 73e8 2cb1 74ab d1f9 ce01 5aca 116d f95c 79db 058e 6f58 c86d 51d2 31b9 345d 02b4 836e fb26
> ath_tx_handoff: 0: 00000000 014d183c c12200a6 0000a0a2 0004006c 01e0000c
> ath_tx_handoff: link[1](0xdecd8cc0)=0x14b1c080 (0xdecd9080) depth 2
> ath_tx_processq: tx queue 1 head 0x14b1c080 link 0xdecd9080
> ath_tx_dmasetup: m 0xc37d5200 len 162
> TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
>  0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 1008 aaaa 0300 0000 0800 4500 0082 012a 4000 4006 3434 c10a 42fd c10a 4006 f506 008f 3433 025c 3326 9db0 5018 ffff 0df1 0000 1703 0100 20bf 5e4c fea5 c3d7 37af 29f1 3071 9791 6e7d 5cad 559b 00aa 204b e704 eea9 58b4 8017 0301 0030 f92c e0f3 3511 5618 19db fb48 5408 86d2 365b 3a0a ca5a 86c0 cd67 3b33 1333 3a73 ed6a 4410 491a e454 5901 e1b3 4e33 d3f8
> ath_tx_handoff: 0: 00000000 014cf23c c12200a6 0000a0a2 0004006c 01e0000c
> ath_tx_handoff: link[1](0xdecd9080)=0x14b1c440 (0xdecd9440) depth 2
> ath_tx_processq: tx queue 1 head 0x14b1c440 link 0xdecd9440
> ath_tx_dmasetup: m 0xc37d4200 len 109
> TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
>  0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 2008 aaaa 0300 0000 0800 4500 004d 012b 4000 4006 3468 c10a 42fd c10a 4006 f506 008f 3433 02b6 3326 9e05 5018 ffff a6cb 0000 1503 0100 20c4 db33 8cb3 dbfd 8f42 a0da 812c 1415 7ccd 8031 ada1 d5b3 48f8 45d4 ad00 c2eb 47
> ath_tx_handoff: 0: 00000000 014ce23c c1220071 0000a06d 00040064 01e0000c
> ath_tx_handoff: link[1](0xdecd9440)=0x14b1c300 (0xdecd9300) depth 2
> ath_tx_dmasetup: m 0xc37d3800 len 72
> TODS 00:05:4e:4e:1f:c7->00:14:4f:0f:ae:f6(00:13:1a:47:7a:33) data 54M
>  0801 2c00 0013 1a47 7a33 0005 4e4e 1fc7 0014 4f0f aef6 3008 aaaa 0300 0000 0800 4500 0028 012c 4000 4006 348c c10a 42fd c10a 4006 f506 008f 3433 02db 3326 9e05 5011 ffff aceb 0000
> ath_tx_handoff: 0: 00000000 014cd83c c122004c 0000a048 0004005c 01e0000c
> ath_tx_handoff: link[1](0xdecd9300)=0x14b1ca80 (0xdecd9a80) depth 3
> ath_tx_processq: tx queue 1 head 0x14b1ca80 link 0xdecd9a80
> ath_calibrate: next +2 (!iqCalDone tries 2)
> ath_calibrate: next +2 (!iqCalDone tries 3)
> ath_calibrate: next +2 (!iqCalDone tries 4)
> ath_calibrate: next +4 (!iqCalDone tries 0)
> ath_calibrate: next +4 (!iqCalDone tries 1)
> ath_calibrate: next +4 (!iqCalDone tries 2)
> ath_calibrate: next +4 (!iqCalDone tries 3)
>
> After a while I also saw some:
>
> ath0: [00:13:1a:47:7a:33] pwr save q overflow, drops 19 (size 50)
>   

Er, did you ever say you were operating in ap mode?

> athstats says:
>
> [root at P142 ~]# athstats
> 306      data frames received
> 335      data frames transmit
> 336      long on-chip tx retries
> 48       tx failed 'cuz too many retries
> 48M      current transmit rate
> 9        beacon miss interrupts
> 107      tx management frames
> 199      tx frames discarded prior to association
> 78       tx frames with no ack marked
> 407      tx frames with short preamble
> 259      tx frames with 11g protection
> 11791    rx failed 'cuz of bad CRC
> 201147   rx failed 'cuz of PHY err
>     116925   OFDM timing
>     1        OFDM restart
>     84126    CCK timing
>     95       CCK restart
> 243      periodic calibrations
> 31       avg recv rssi
> -97      rx noise floor
> 4997     switched default/rx antenna
> Antenna profile:
> [1] tx      363 rx    53933
> [2] tx       14 rx    18762
>
> and ifconfig -v:
>
> ath0: flags=8847<UP,BROADCAST,DEBUG,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         ether 00:05:4e:4e:1f:c7
>         inet 193.10.66.253 netmask 0xfffffc00 broadcast 193.10.67.255
>         media: IEEE 802.11 Wireless Ethernet autoselect (OFDM/48Mbps)
>         status: associated
>         ssid SICS channel 11 (2462 Mhz 11g) bssid 00:13:1a:47:7a:33
>         authmode OPEN privacy OFF deftxkey UNDEF powersavemode OFF
>         powersavesleep 100 txpower 31.5 txpowmax 17.0 rtsthreshold 2346
>         fragthreshold 2346 bmiss 7 scanvalid 60 bgscan bgscanintvl 300
>         bgscanidle 250 roam:rssi11a 7 roam:rate11a 12 roam:rssi11b 7
>         roam:rate11b 1 roam:rssi11g 7 roam:rate11g 5 -pureg protmode CTS -ht
>         -htcompat -ampdu ampdulimit 8k ampdudensity - -amsdu -shortgi
>         htprotmode RTSCTS -puren -wme burst -ff -dturbo roaming AUTO
>         bintval 100
>
>   
You've removed all the context of the original problem; I can't recall 
what you were trying to fix.  The ps q drops might be caused by a bug 
that was fixed (I think in RELENG_7).  I don't see what version of code 
you're running so can't tell.

    Sam



More information about the freebsd-mobile mailing list