kern/166190: [ath] TX hangs and frames stuck in TX queue
adrian at freebsd.org
Mon Mar 19 00:10:05 UTC 2012
I think I understand what's going on here.
It turns out that multiple instances of the TX code (via if_start())
were running at the same time. These were processing frames from the
input queue and assigning them sequence numbers.
This seems to be occuring:
* thread A would allocate sequence number 5
* thread B would concurrency allocate sequence number 6
* thread B would then "win" the race to add it to the BAW, as the
sequence numbers were allocated early but it wouldn't be added to the
queue until much later
* then thread A would try adding its frame to the BAW, but since the
BAW left edge is now 6, 5 is now "out of window".
I have a local patch here which I'm going to test tonight/tomorrow. It
delays the sequence number allocation until _right before_ the frame
may be added to the BAW. This is done inside the same lock, so there's
no chance that it'll race with another concurrent thread.
I won't commit it until I have committed some verification code to
-HEAD to complain loudly when a frame _before_ the BAW is trying to be
queued. Since that shouldn't happen in reality, I'm going to guess
that it'll pop up in my testing and Vincents use.
Once I've verified that (a) my sanity checking code is firing as I
expect it to, (b) Vincent also sees the same, and (c) this is fixed by
my patch, I'll look at committing it.
Vincent - thanks so very much for persisting with this bug! I'd not
have really found it at all if you didn't point the odd behaviour out
Now - yes, the solution would also be "serialise the whole TX queue
damnit." Yes, that'd solve it, but as I'm seeing 802.11ac around the
corner, I'd like to actually debug, diagnose and document how a
multi-threaded TX/RX path could work. Serialising the driver TX path
isn't going to help me do that. :-)
More information about the freebsd-wireless