[RFC] serialising net80211 TX

Fri Feb 15 22:51:48 UTC 2013

> ------------------------------
>
> Message: 2
> Date: Wed, 13 Feb 2013 21:14:53 -0800
> From: Adrian Chadd <adrian at freebsd.org>
> To: freebsd-wireless at freebsd.org
> Subject: [RFC] serialising net80211 TX
> Message-ID:
>         <CAJ-VmonS0cds9nCFYxc_nZuDRL93=2_4T2B4tUzPuGC3Bhz2FA at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> I'd like to work on the net80211 TX serialisation now. I'll worry
> about driver serialisation and if_transmit methods later.
>
> The 30 second version - it happens in parallel, which means preemption
> and multi-core devices can and will hit lots of subtle and
> hard-to-debug races in the TX path.
>
> We actually need an end-to-end serialisation method - not only for the
> 802.11 state (sequence number, correct aggregation handling, etc) but
> to line up 802.11 sequence number allocation with the encryption IV/PN
> values. Otherwise you end up with lots of crazy subtle out of order
> packets occuring. The other is the seqno/CCMP IV race between the raw
> transmit path and the normal transmit path. There are other nagging
> issues that I'm trying to resolve - but, one thing at a time.
>
> So there are three current contenders:
>
> * wrap everything in net80211 TX in a per-vap TX lock; grab it at the
> beginning of ieee80211_output() and ieee80211_start(), and don't
> release it until the frame is queued to something (a power save queue,
> an age queue, the driver.) That guarantees that the driver is called
> in lock-step with each frame being processed.

Long held locks could be worse. While another thread holding a lock,
when one thread try to grab a lock, it will spin a bit then be
suspended. This could be more expensive than context switching by the
scheduler. If a thread hold a lock longer, there will be more chance
this to happen.

> * do deferred transmit- ie, the net80211 entry points simply queue
> mbufs to a queue, and a taskqueue runs over the ifnet queue and runs
> those frames in-order. There's no need for a lock here as there's only
> one sending context (either per-VAP or per-IC).

I tried taskqueue on run(4), changed it from 1) to 2). Queue with
shared thaskqueue, i.e taskqueue_thread, didn't work good, but with
own taskqueue works good (60mbp->70+mbp). The catch is I only tested
on core2 duo and dual core + ht atom.

I saw you patched ath(4) to use taskqueue. How is it working?

There are threads other than Tx thread are running on a system, so
context switching will happen one way or other. We want to do it smart
way, i.e. rather than changing thread on every tx, switch on multiple
tx.

Currently, tx threads are kept alive until packets are passed to h/w.
Instead, if we kill them right after queuing a frame to if_snd and let
one thread (would be new ieee80211_tx thread) handle all the packets,
there will be fewer threads, so should be less context switching.

Anyhow, I don't think we need to queue/dequeue twice, one on
vap->iv_ifp->if_snd and one on ic->ic_ifp->if_snd.

AK