Driver patch to look at...

Randall Stewart rrs at lakerest.net
Tue Feb 5 15:24:26 UTC 2013


Here is an updated patch… sigh.. I foobar'd the ALTQ stuff.. lots of crashes ;-D


R

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: driver_patch.txt
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20130205/694574e3/attachment.txt>
-------------- next part --------------

On Feb 5, 2013, at 6:49 AM, Randall Stewart wrote:

> John:
> 
> Here is an updated patch, per your suggestions. Note that I also
> expanded and the only driver that uses these methods I did not touch
> is the cxgb, but thats because I am not really sure it has the problem… it
> does not quite enqueue the same way (it appears) that the other drivers do ;-)
> 
> R
> 
> <driver_patch.txt>
> On Feb 5, 2013, at 5:49 AM, Randy Stewart wrote:
> 
>> 
>> On Feb 4, 2013, at 3:24 PM, John Baldwin wrote:
>> 
>>> On Monday, February 04, 2013 12:22:49 pm Randy Stewart wrote:
>>>> All:
>>>> 
>>>> I have been working with TCP in gigabit networks (igb driver actually) and have
>>>> found a very nasty problem with the way the driver is doing its put back when
>>>> it fills the out-bound transmit queue.
>>>> 
>>>> Basically it has taken a packet from the head of the ring buffer, and then 
>>>> realizes it can't fit it into the transmit queue. So it just re-enqueue's it
>>>> into the ring buffer. Whats wrong with that? Well most of the time there
>>>> are anywhere from 10-50 packets (maybe more) in that ring buffer when you are
>>>> operating at full speed (or trying to). This means you will see 10 duplicate
>>>> ACKs, do a fast retransmit and cut your cwnd in half.. not very nice actually.
>>>> 
>>>> The patch I have attached makes it so that
>>>> 
>>>> 1) There are ways to swap back.
>>>> 2) Use the peek in the ring buffer and only
>>>> dequeue the packet if we put it into the transmit ring
>>>> 3) If something goes wrong and the transmit frees the packet we dequeue it.
>>>> 4) If the transmit changed it (defrag etc) then swap out the new mbuf that
>>>> has taken its place.
>>>> 
>>>> I have fixed the four intel drivers that had this systemic issue, but there
>>>> are still more to fix.
>>>> 
>>>> Comments/review .. rotten egg's etc.. would be most welcome before
>>>> I commit this..
>>> 
>>> Does this only happen in drivers that use buffering?
>> 
>> Yep, there are a lot of drivers that *do not* use the drbr_xxxx() functions and
>> for those they do the IFQ_DRV_PREPEND().. its only the newer drivers like the
>> intel 1Gig and 10Gig ones that seem effected
>> 
>> Also effected are :
>> 
>> bxe
>> cxgb
>> oce
>> en
>> 
>> I have not fixed those yet.
>> 
>>> I seem to recall that
>>> drivers using IFQ would just stuff the packet at the head of the IFQ via
>>> IFQ_DRV_PREPEND() in this case so it is still the next packet to transmit.
>>> See, for example, this bit in dc_start_locked():
>>> 
>>> 	for (queued = 0; !IFQ_DRV_IS_EMPTY(&ifp->if_snd); ) {
>>> 		/*
>>> 		 * If there's no way we can send any packets, return now.
>>> 		 */
>>> 		if (sc->dc_cdata.dc_tx_cnt > DC_TX_LIST_CNT - DC_TX_LIST_RSVD) {
>>> 			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
>>> 			break;
>>> 		}
>>> 		IFQ_DRV_DEQUEUE(&ifp->if_snd, m_head);
>>> 		if (m_head == NULL)
>>> 			break;
>>> 
>>> 		if (dc_encap(sc, &m_head)) {
>>> 			if (m_head == NULL)
>>> 				break;
>>> 			IFQ_DRV_PREPEND(&ifp->if_snd, m_head);
>>> 			ifp->if_drv_flags |= IFF_DRV_OACTIVE;
>>> 			break;
>>> 		}
>>> 
>>> It sounds like drbr/buf_ring just don't handle this case correctly?  It
>>> seems a shame to have to duplicate so much code in the various drivers to
>>> fix this, but that seems to be par for the course when using buf_ring. :(
>>> (buggy in edge cases and lots of duplicated code that is).
>> 
>>> Also, doing the drbr_swap() just so that drbr_dequeue() returns what you
>>> just swapped in seems... odd.  It seems that it would be nicer instead
>>> to have some sort of drbr_peek() / drbr_advance() where the latter just
>>> skips over whatever the current head is?  Then you could have something
>>> like:
>>> 
>>> 	while ((next = drbr_peek()) != NULL) {
>>> 		if (!foo_encap(&next)) {
>>> 			if (next == NULL)
>>> 				drbr_advance();
>>> 			break;
>>> 		}
>>> 		drbr_advance();
>>> 	}
>>> 
>> 
>> That was what I originally did (without the rename), but there is a sure crash waiting in that.
>> The only difference from what I originally had was just drbr_dequeue().. but
>> I was being a bit lazy and not wanting to add yet another function to the 
>> drbr_xxxx code since essential it would be a clone of drbr_dequeue() without
>> returning the mbuf.
>> 
>> The crash potential here is in that foo_encap(&next) often times will return
>> a different mbuf (at least in the igb driver it does). I believe its due
>> to either the m_pullup() which could change the lead mbuf you want
>> in the drbr_ring, or the m_defrag all within igb_xmit. Thus you have
>> to track what comes back from the !foo_encap() call and compare it to 
>> see if you must swap it out. 
>> 
>> 
>>> I guess the sticky widget is the case of ATLQ as you need to dequeue the
>>> packet always in the ALTQ case and put it back if the encap fails.
>> 
>> Yeah ALTQ is ugly and IMO we need to re-write it anyway.. maybe re-think
>> this whole layer ;-0
>> 
>>> For
>>> your patch it's not clear to me how that works.  It seems that if the
>>> encap routine frees the mbuf you try to dereference a freed pointer when
>>> you call drbr_dequeue().
>> 
>> Hmm you are right.. I forgot how we keep those using the mbuf itself...
>> 
>>> I really think you will instead need some sort
>>> of 'drbr_putback()' and have 'drbr_peek()' dequeue in the ALTQ case and
>>> use 'drbr_putback()' to put it back (PREPEND) in the ALTQ case.
>> 
>> We could do that but drbr_putback() would probably need both the old
>> and new pointers and then we could make it do the ring_swap() to put
>> the right mbuf in place..
>> 
>> Let me go explore that and come up with a better patch ;-)
>> 
>> R
>> 
>> 
>>> 
>>> -- 
>>> John Baldwin
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>> 
>> 
>> -----
>> Randall Stewart
>> randall at lakerest.net
>> 
>> 
>> 
>> 
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>> 
> 
> ------------------------------
> Randall Stewart
> 803-317-4952 (cell)
> 
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"

------------------------------
Randall Stewart
803-317-4952 (cell)



More information about the freebsd-net mailing list