Driver patch to look at...

John Baldwin jhb at freebsd.org
Mon Feb 4 20:46:30 UTC 2013


On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote:
> All:
> 
> I have been working with TCP in gigabit networks (igb driver actually) and have
> found a very nasty problem with the way the driver is doing its put back when
> it fills the out-bound transmit queue.
> 
> Basically it has taken a packet from the head of the ring buffer, and then 
> realizes it can't fit it into the transmit queue. So it just re-enqueue's it
> into the ring buffer. Whats wrong with that? Well most of the time there
> are anywhere from 10-50 packets (maybe more) in that ring buffer when you are
> operating at full speed (or trying to). This means you will see 10 duplicate
> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice actually.
> 
> The patch I have attached makes it so that
> 
> 1) There are ways to swap back.
> 2) Use the peek in the ring buffer and only
>    dequeue the packet if we put it into the transmit ring
> 3) If something goes wrong and the transmit frees the packet we dequeue it.
> 4) If the transmit changed it (defrag etc) then swap out the new mbuf that
>    has taken its place.
> 
> I have fixed the four intel drivers that had this systemic issue, but there
> are still more to fix.
> 
> Comments/review .. rotten egg's etc.. would be most welcome before
> I commit this..

Does this only happen in drivers that use bufring?  I seem to recall that
drivers using IFQ would just stuff the packet at the head of the IFQ via
IFQ_DRV_PREPEND() in this case so it is still the next packet to transmit.
See, for example, this bit in dc_start_locked():

	for (queued = 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) {
		/*
		 * If there's no way we can send any packets, return now.
		 */
		if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - DC_TX_LIST_RSVD) {
			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
			break;
		}
		IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
		if (m_head == NULL)
			break;

		if (dc_encap(sc, &m_head)) {
			if (m_head == NULL)
				break;
			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
			break;
		}

It sounds like drbr/buf_ring just don't handle this case correctly?  It
seems a shame to have to duplicate so much code in the various drivers to
fix this, but that seems to be par for the course when using buf_ring. :(
(buggy in edge cases and lots of duplicated code that is).

Also, doing the drbr_swap() just so that drbr_dequeue() returns what you
just swapped in seems... odd.  It seems that it would be nicer instead
to have some sort of drbr_peek() / drbr_advance() where the latter just
skips over whatever the current head is?  Then you could have something
like:

	while ((next = drbr_peek()) != NULL) {
		if (!foo_encap(&next)) {
			if (next == NULL)
				drbr_advance();
			break;
		}
		drbr_advance();
	}

I guess the sticky widget is the case of ATLQ as you need to dequeue the
packet always in the ALTQ case and put it back if the encap fails.  For
your patch it's not clear to me how that works.  It seems that if the
encap routine frees the mbuf you try to dereference a freed pointer when
you call drbr_dequeue().  I really think you will instead need some sort
of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case and
use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case.

-- 
John Baldwin


More information about the freebsd-net mailing list