net80211 race conditions seen in -HEAD
adrian at freebsd.org
Sun Jan 22 05:40:13 UTC 2012
I've noticed some kernel panics in net80211/ath in -HEAD. It in all
instances boils down to a now-invalid ieee80211_node - either it's
partially allocated/copied, or it's been recently freed.
This became increasingly obvious when doing DFS CAC, as the kernel was now
changing the channel quite frequently on me whilst simulating/processing
radar events. I've since found I can mostly reproduce it in the lab (when
surrounded by ridiculous levels of RX intereference traffic, triggering all
kinds of events) whilst creating/destroying VAPs.
Now that I have debugging code in place (which as a side effect makes it
very difficult now to cause a crash, let alone tickle the race condition)
it's glaringly obvious what's going on.
There's five contexts stuff can occur, at least in the net80211/ath case:
* the swi (ie ath_intr(), ath_beacon_proc)
* the ath taskqueue;
* the net80211 taskqueue;
* the ioctl() context, coming up from a userland process;
* a callout running in the clock thread.
Now, callouts should _hopefully_ be grabbing and releasing locks correctly.
We've found a few spots where they weren't (leading to quite silly state
races and crashes.)
I'm going to ignore the obvious possible problems with multiple concurrent
processes doing ioctl()s. l'm simply going to operate on the principle that
the multiple-ioctl() path is fine.
It seems that -obtaining- references to vap->iv_bss aren't locked. So in
(say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed.
If this is going on concurrently with (say) something going on in the
net80211 taskqueue (eg a newstate call) then I _think_ it's possible for
the ath_newstate() code to get a reference to vap->iv_bss simultaneously
with it being freed in ieee80211_sta_join1() (or similar.) So the
ath_newstate() code will be assigned a 'ni' that has just been freed.
I've seen another crash in the net80211_ht code where it _looks_ like the
bss node wasn't entirely setup - bsschan was 0xffff - so the kernel paniced
This likely explains a lot of the "weird stuff" people have been reporting.
I also think the bgscan race is related to this - I can't help but wonder
if the bgscan callout/event is also coinciding with wpa_supplicant doing
stuff, and a race condition ends up leaving the vap w/ the sta power save
I don't yet have a solution to all of this - I just wanted to brain dump
what I've seen thus far.
More information about the freebsd-wireless