[Bug 258948] [net80211] AP + STA configuration can lead to the AP VAP stopping traffic after STA scan

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 05 Oct 2021 15:35:54 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258948

            Bug ID: 258948
           Summary: [net80211] AP + STA configuration can lead to the AP
                    VAP stopping traffic after STA scan
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: wireless
          Assignee: wireless@FreeBSD.org
          Reporter: adrian@freebsd.org

Here's a fun very hilarious corner case when you start using STA + AP
configurations.

My test setup:

* An AR9380 radio with a STA VAP (DWDS child / upstream facing) and AP VAP
* A second AR9380 radio with AP VAPs with DWDS parent / downstream facing
configured
* bridging between them all

When the STA VAP decides that it needs to scan to find a new AP, sometimes the
AP will stop traffic / 802.1x negotiation, even after the STA VAP finishes
scanning and reassociates.

After like a year of narrowing things down, I've finally figured out what's
going on:

* there are STA beacon miss events, which lead net80211/wpa_supplicant to move
from RUN to SCAN state
* this calls markwaiting(), which will mark all the other VAPs as waiting
* this calls vap->iv_newstate(vap, INIT, ...) to set the state to INIT for each
other VAP on the radio
* Then some packet is transmitted on the AP VAP via ieee80211_vap_transmit(),
and since it's not in the RUN state, the OACTIVE flag is set on vap->iv_ifp
* .. time passes ..
* Finally, the STA VAP transitions through its states to eventually hit RUN
* .. which will call wakeupwaiting()
* .. which iterates over all the VAPs again and calls vap->iv_newstate(vap,
RUN, ...)

.. now at this point, the VAP specific mode newstate code and the driver
specific newstate code is running, but! Note! These codepaths aren't going via
the ieee80211_new_state() / ieee80211_new_state_locked(), and the only path
that clears OACTIVE is in here.

* Then eventually a call to ieee80211_new_state*() is done for the AP VAP,
setting the state to RUN
* However! The deferred taskqueue (ieee80211_new_state_cb()) code sees a state
going RUN->RUN, rather than RUN->INIT->RUN, thus it does NOT clear OACTIVE.

This is why associations worked fine, but the raw BPF sends did not -
ieee80211_output() (used by BPF) checks OACTIVE flag and just drops the
packets.

The real eventual fix is removing OACTIVE, but this does require a pass through
all the wifi drivers to make sure none of them are using OACTIVE anymore.

The temporary fix is just to clear the OACTIVE flag in ieee80211_new_state_cb()
if the state is RUN, even if it's RUN->RUN.

-- 
You are receiving this mail because:
You are the assignee for the bug.