RFT: if_ath HAL refactoring

Wed Sep 22 22:48:19 UTC 2010

On 22 Sep 2010, at 23:42, PseudoCylon wrote:

> 
> 
> 
> 
> ----- Original Message ----
>> From: Bernhard Schmidt <bschmidt at techwires.net>
>> To: freebsd-current at freebsd.org
>> Cc: PseudoCylon <moonlightakkiy at yahoo.ca>; Adrian Chadd <adrian at freebsd.org>
>> Sent: Wed, September 22, 2010 12:09:36 AM
>> Subject: Re: RFT: if_ath HAL refactoring
>> 
>> On Wednesday, September 22, 2010 06:04:49 PseudoCylon wrote:
>>> -----  Original Message ----
>>> 
>>>> From: Adrian Chadd <adrian at freebsd.org>
>>>> To:  PseudoCylon <moonlightakkiy at yahoo.ca>
>>>> Cc: freebsd-current at freebsd.org
>>>> Sent: Tue, September 21, 2010 7:04:37 AM
>>>> Subject: Re: RFT:  if_ath HAL refactoring
>>>> 
>>>> On 21 September 2010 11:58,  PseudoCylon <moonlightakkiy at yahoo.ca>   
> wrote:
>>>>> Just in case anyone wonders, I've added 11n support to  run(4)  (USB
>>>>> NIC). http://gitorious.org/run/run/trees/11n_beta2
>>>>> 
>>>>> It still has some issues,
>>>>> 
>>>>> *  doesn't work well with atheros chips
>>>>> 
>>>>>  * HT + AP + bridge = Tx may stall (seems OK with nat)
>>>>> 
>>>>> So, use it at your  own discretion.
>>>> 
>>>> Want to put together a patch?
>>> 
>>> sure!
>>> 
>>>> Does  it introduce  issues in the non-11n case?
>>> 
>>> No, only in 11n  mode.
>>> 
>>> What I have found so far is that Ralink's driver checks  MAC address of
>>> other end and identify atheros chip by oui. Then, sets  special prot mode
>>> for it. Does this ring a bell?
>> 
>> Are your sure  that this is based on the actual MAC addresses? Atheros drivers 
> 
>> tend to  announce additional capabilities in beacons and probe responses.
> 
> It is based on the actual MAC, but it is Broadcom's oui (00904c). sorry.
> 
>> 
>>> Has  node lock in ieee80211_node_timeout() cased dead lock in HT + AP +
>>> bridge?
>> 
>> I'm not aware of any issues there, though, I'm not very familiar  with HT use 
>> cases.
> 
> I attached witness messages. Those 2 LORs always happen together before 
> deadlock. I hooked iv_input() and unlock/lock node lock to avoid deadlock. (I 
> don't know if it's safe.)
> 
> I wonder if this is run(4) specific problem.
> 
> 
> AK
> 
> 
> lock order reversal:
> 1st 0xffffff8000a267d0 run0_node_lock (run0_node_lock) @ 
> /usr/src/sys/net80211/ieee80211_node.c:1360
> 2nd 0xffffff0001716818 if_bridge (if_bridge) @ 
> /usr/src/sys/net/if_bridge.c:2184
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> _witness_debugger() at _witness_debugger+0x2e
> witness_checkorder() at witness_checkorder+0x81e
> _mtx_lock_flags() at _mtx_lock_flags+0x78
> bridge_input() at bridge_input+0x7e
> ether_input() at ether_input+0x143
> hostap_input() at hostap_input+0x4ea
> ampdu_rx_flush() at ampdu_rx_flush+0x5e
> ieee80211_ht_node_age() at ieee80211_ht_node_age+0x7b
> ieee80211_node_timeout() at ieee80211_node_timeout+0x2dc
> softclock() at softclock+0x2a0
> intr_event_execute_handlers() at intr_event_execute_handlers+0x66
> ithread_loop() at ithread_loop+0xb2
> fork_exit() at fork_exit+0x12a
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff8000052d30, rbp = 0 ---
> 
> lock order reversal:
> 1st 0xffffff8000a267d0 run0_node_lock (run0_node_lock) @ 
> /usr/src/sys/net80211/ieee80211_node.c:1360
> 2nd 0xffffffff80a186c8 tcp (tcp) @ /usr/src/sys/netinet/tcp_input.c:498
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> _witness_debugger() at _witness_debugger+0x2e
> witness_checkorder() at witness_checkorder+0x81e
> _rw_rlock() at _rw_rlock+0x5f
> tcp_input() at tcp_input+0xa58
> ip_input() at ip_input+0xbc
> netisr_dispatch_src() at netisr_dispatch_src+0xb8
> ether_demux() at ether_demux+0x17d
> ether_input() at ether_input+0x175
> hostap_input() at hostap_input+0x4ea
> ampdu_rx_flush() at ampdu_rx_flush+0x5e
> ieee80211_ht_node_age() at ieee80211_ht_node_age+0x7b
> ieee80211_node_timeout() at ieee80211_node_timeout+0x2dc
> softclock() at softclock+0x2a0
> intr_event_execute_handlers() at intr_event_execute_handlers+0x66
> ithread_loop() at ithread_loop+0xb2
> fork_exit() at fork_exit+0x12a
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xffffff8000052d30, rbp = 0 --- 

Can you explain why the run0_node_lock is locked ? I don't have the code at hand..

Regards,
--
Rui Paulo