realtek performance (was Re: good ATI chipset results)

Thu Oct 13 13:36:23 PDT 2005

Sean McNeil wrote:
> On Thu, 2005-10-13 at 16:17 -0400, John Baldwin wrote:
> 
>>On Thursday 13 October 2005 03:46 pm, Scott Long wrote:
>>
>>>John Baldwin wrote:
>>>
>>>>On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote:
>>>>
>>>>>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote:
>>>>>
>>>>>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote:
>>>>>>
>>>>>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote:
>>>>>>>
>>>>>>>>Havent really seen anyone else use this board, but I have had good
>>>>>>>>luck with it so far
>>>>>>>>
>>>>>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50
>>>>>>>>6&Me nuID=90&LanID=0
>>>>>>>>
>>>>>>>>Its a micro ATX formfactor with built in video and the onboard NIC is
>>>>>>>>a realtek.  (Although its not the fastest NIC, its driver is stable
>>>>>>>>and mature-- especially compared to the headaches people seem to have
>>>>>>>>with the NVIDIA NICs.)
>>>>>>>
>>>>>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet?
>>>>>>>
>>>>>>>For those interested, here are some changes I always use to increase
>>>>>>>the performance of the above NIC.  With these mods, I can stream over
>>>>>>>20 MBps video multicast and do other stuff over the network without
>>>>>>>issues. Without the changes, xmit is horrible with severe UDP packet
>>>>>>>loss.
>>>>>>
>>>>>>So, I see two changes.  One is to up the number of descriptors from 32
>>>>>>rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx
>>>>>>on other modules.  The other thing is that you seem to pessimize TX
>>>>>>performance by always forcing the send packets to be coalesced into one
>>>>>>mbuf (which requires doing an alloc and then copying all of the data)
>>>>>>instead of making use of scatter/gatter for sending packets.  Do you
>>>>>>need both changes or do just the higher descriptor counts make the
>>>>>>difference?
>>>>>
>>>>>Actually, I've found that the higher descriptor counts do not make a
>>>>>noticeable difference.  The only thing that mattered was to eliminate
>>>>>the scatter/gather of sending packets.  I can't remember why I left the
>>>>>descriptor increase in there.  I think it was to get the best use out of
>>>>>the hardware.
>>>>
>>>>Hmm, odd.  Scott, do you have any ideas why m_defrag() plus one
>>>>descriptor would be faster than s/g dma for re(4)?
>>>
>>>There are two things that I would consider.  First is that
>>>bus_dmamap_load_mbuf_sg()
>>>should be use, as that cuts out some indirection (and thus latency) in
>>>the code.  Second
>>>is that not all DMA engines are created equal, and I honestly wouldn't
>>>expect a whole lot
>>>out of Realtek given the price point of this chip.  It might be
>>>optimized only for operating
>>>on only a single S/G element, for example.  Maybe it's really slow at
>>>pre-fetching s/g
>>>elements, or maybe it has some sort of a stall after each DMA sement
>>>transfer while it
>>>restarts a state machine.  I've seen evidence in other hardware that
>>>only one S/G element
>>>should be used even though there are slots for 2 (or 3 in the case of 9k
>>>jumbo frames).  One
>>>thing to keep in mind is the difference in the driver models between
>>>Windows and BSD
>>>that Bill Paul talked about the other day.  In the Windows world, the
>>>driver owns the
>>>network packet memory, whereas in BSD the stack owns it (in the form of
>>>mbufs).  This
>>>means that the driver can pre-allocate a contiguous slab and populate
>>>the descriptor rings
>>>with it without ever having to worry about s/g fragmentation, while in
>>>BSD fragmentation
>>>is a fact of life.  So it's likely yet another case of hardware being
>>>optimized for certain
>>>characteristics of Windows at the expense of other operating systems.
>>
>>Ok.  Sean, do you think you can trim the patch down to just the m_defrag() 
>>changes and test that to make sure that is all that is needed?
> 
> 
> Certainly.  I can even write a small comment that specifies it is most
> likely some sort of hardware limitation of DMAing fragmented data :)
> 
> I'm knee deep in something else at the moment, so it might be a day or
> two.  I'll try to get it to you sooner.
> 
> Sean
> 
> 

Do also consider using bus_dmamap_load_mbuf_sg().  It does make a
difference on gige drivers.

Scott