svn commit: r241558 - head/sys/dev/ath

Sun Oct 14 20:31:38 UTC 2012

Author: adrian
Date: Sun Oct 14 20:31:38 2012
New Revision: 241558
URL: http://svn.freebsd.org/changeset/base/241558

Log:
  Break the RX processing up into smaller chunks of 128 frames each.
  
  Right now processing a full 512 frame queue takes quite a while (measured
  on the order of milliseconds.) Because of this, the TX processing ends up
  sometimes preempting the taskqueue:
  
  * userland sends a frame
  * it goes in through net80211 and out to ath_start()
  * ath_start() will end up either direct dispatching or software queuing a
    frame.
  
  If TX had to wait for RX to finish, it would add quite a few ms of
  additional latency to the packet transmission.  This in the past has
  caused issues with TCP throughput.
  
  Now, as part of my attempt to bring sanity to the TX/RX paths, the first
  step is to make the RX processing happen in smaller 'parts'. That way
  when TX is pushed into the ath taskqueue, there won't be so much latency
  in the way of things.
  
  The bigger scale change (which will come much later) is to actually
  process the frames in the ath_intr taskqueue but process _frames_ in
  the ath driver taskqueue.  That would reduce the latency between
  processing and requeuing new descriptors. But that'll come later.
  
  The actual work:
  
  * Add ATH_RX_MAX at 128 (static for now);
  * break out of the processing loop if npkts reaches ATH_RX_MAX;
  * if we processed ATH_RX_MAX or more frames during the processing loop,
    immediately reschedule another RX taskqueue run.  This will handle
    the further frames in the taskqueue.
  
  This should have very minimal impact on the general throughput case,
  unless the scheduler is being very very strange or the ath taskqueue
  ends up spending a lot of time on non-RX operations (such as TX
  completion.)

Modified:
  head/sys/dev/ath/if_ath_rx.c

Modified: head/sys/dev/ath/if_ath_rx.c
==============================================================================

--- head/sys/dev/ath/if_ath_rx.c	Sun Oct 14 20:00:00 2012	(r241557)
+++ head/sys/dev/ath/if_ath_rx.c	Sun Oct 14 20:31:38 2012	(r241558)
@@ -797,6 +797,8 @@ rx_next:
 	return (is_good);
 }
 
+#define	ATH_RX_MAX		128
+
 static void
 ath_rx_proc(struct ath_softc *sc, int resched)
 {
@@ -832,6 +834,15 @@ ath_rx_proc(struct ath_softc *sc, int re
 	sc->sc_stats.ast_rx_noise = nf;
 	tsf = ath_hal_gettsf64(ah);
 	do {
+		/*
+		 * Don't process too many packets at a time; give the
+		 * TX thread time to also run - otherwise the TX
+		 * latency can jump by quite a bit, causing throughput
+		 * degredation.
+		 */
+		if (npkts >= ATH_RX_MAX)
+			break;
+
 		bf = TAILQ_FIRST(&sc->sc_rxbuf);
 		if (sc->sc_rxslink && bf == NULL) {	/* NB: shouldn't happen */
 			if_printf(ifp, "%s: no buffer!\n", __func__);
@@ -942,11 +953,22 @@ rx_proc_next:
 	}
 #undef PA2DESC
 
+	/*
+	 * If we hit the maximum number of frames in this round,
+	 * reschedule for another immediate pass.  This gives
+	 * the TX and TX completion routines time to run, which
+	 * will reduce latency.
+	 */
+	if (npkts >= ATH_RX_MAX)
+		taskqueue_enqueue(sc->sc_tq, &sc->sc_rxtask);
+
 	ATH_PCU_LOCK(sc);
 	sc->sc_rxproc_cnt--;
 	ATH_PCU_UNLOCK(sc);
 }
 
+#undef	ATH_RX_MAX
+
 /*
  * Only run the RX proc if it's not already running.
  * Since this may get run as part of the reset/flush path,