Questions about locking; turnstiles and sleeping threads

Thu Nov 13 12:40:31 UTC 2014

On 11/13/14, 11:39 AM, Adrian Chadd wrote:
> On 12 November 2014 18:26, Alexander Kabaev <kabaev at gmail.com> wrote:
>> On Wed, 12 Nov 2014 18:13:55 -0800
>> Adrian Chadd <adrian at freebsd.org> wrote:
>>
>>> Hi,
>>>
>>> I have a bit of an odd case here.
>>>
>>> I'm getting panics in the net80211/ath code, "sleeping thread (X) owns
>>> non-sleepable lock."
>>>
>>> show alllocks just showed one lock held - the net80211 comlock. It's a
>>> recursive mutex, that's supposed to be sleepable.
>>>
>>> The two threads in question look like this:
>>>
>>> thread X: net80211_newstate_cb (grabs IEEE80211_LOCK())
>>>      ath_newstate
>>>      callout_drain - which grabs the ATH_LOCK as part of the callout
>>> drain side of things
>>>      that enters sleepq_wait() and goes to sleep, waiting for
>>> whatever's running the callout to
>>>      finish
>>>
>>> thread Y:
>>>      rx_path in if_ath_rx_edma
>>>      ath_rx_pkt -> sta_input -> ath_recv_mgmt -> sta_recv_mgmt (grabs
>>> IEEE80211_LOCK()) -> panics
>>>
>>> Thread Y doesn't hold any other locks. It's just trying to grab the
>>> IEEE80211_LOCK that is being held by thread X. But thread X is asleep
>>> waiting for whatever callout to finish so it can continue. The code in
>>> propagate_priority() sees that thread X is sleeping and panics.
>>>
>>> So, what's really going on? I don't mind (well, "don't mind") having
>>> to take another deep dive through all of this to sort it out so it
>>> doesn't tickle the callout / turnstile code in this particular
>>> fashion, but I'd first like to ensure that it's not some corner case
>>> that isn't handled by the check in propagate_priority().
>>>
>>> Thanks,
>>>
>>>
>>> -adrian
>>> _______________________________________________
>> Hi,
>>
>> mutexes are blocking and not sleepable primitives, so doing any
>> unbounded sleep with mutex locked, such as one you are attempting by
>> calling callout_drain is illegal. In other words, you are getting an
>> expected assert and the code in question is wrong.
> Hi,
>
> Right. That isn't mentioned in the manpage. The manpage says:
>
>       The function callout_drain() is identical to callout_stop() except that
>       it will wait for the callout to be completed if it is already in
>       progress.  This function MUST NOT be called while holding any locks on
>       which the callout might block, or deadlock will result.  Note that if the
>       callout subsystem has already begun processing this callout, then the
>       callout function may be invoked during the execution of callout_drain().
>       However, the callout subsystem does guarantee that the callout will be
>       fully stopped before callout_drain() returns.

also look at      'man locking'
>
> The callout isn't going to block here, but another thread may block.
>
> This is good to know. I'll see if I can come up with an addition to
> the manpage about this.
>
> I'm going to have to do another pass over all of the wifi drivers and
> stack to see where this is happening. Ugh. :(
>
> Thanks!
>
>
>
> -adrian
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
>