realtek performance (was Re: good ATI chipset results)

Scott Long scottl at
Thu Oct 13 12:46:27 PDT 2005

John Baldwin wrote:
> On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote:
>>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote:
>>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote:
>>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote:
>>>>>Havent really seen anyone else use this board, but I have had good
>>>>>luck with it so far
>>>>>6&Me nuID=90&LanID=0
>>>>>Its a micro ATX formfactor with built in video and the onboard NIC is
>>>>>a realtek.  (Although its not the fastest NIC, its driver is stable
>>>>>and mature-- especially compared to the headaches people seem to have
>>>>>with the NVIDIA NICs.)
>>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet?
>>>>For those interested, here are some changes I always use to increase
>>>>the performance of the above NIC.  With these mods, I can stream over
>>>>20 MBps video multicast and do other stuff over the network without
>>>>issues. Without the changes, xmit is horrible with severe UDP packet
>>>So, I see two changes.  One is to up the number of descriptors from 32 rx
>>>and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx on
>>>other modules.  The other thing is that you seem to pessimize TX
>>>performance by always forcing the send packets to be coalesced into one
>>>mbuf (which requires doing an alloc and then copying all of the data)
>>>instead of making use of scatter/gatter for sending packets.  Do you need
>>>both changes or do just the higher descriptor counts make the difference?
>>Actually, I've found that the higher descriptor counts do not make a
>>noticeable difference.  The only thing that mattered was to eliminate
>>the scatter/gather of sending packets.  I can't remember why I left the
>>descriptor increase in there.  I think it was to get the best use out of
>>the hardware.
> Hmm, odd.  Scott, do you have any ideas why m_defrag() plus one descriptor 
> would be faster than s/g dma for re(4)?

There are two things that I would consider.  First is that 
should be use, as that cuts out some indirection (and thus latency) in 
the code.  Second
is that not all DMA engines are created equal, and I honestly wouldn't 
expect a whole lot
out of Realtek given the price point of this chip.  It might be 
optimized only for operating
on only a single S/G element, for example.  Maybe it's really slow at 
pre-fetching s/g
elements, or maybe it has some sort of a stall after each DMA sement 
transfer while it
restarts a state machine.  I've seen evidence in other hardware that 
only one S/G element
should be used even though there are slots for 2 (or 3 in the case of 9k 
jumbo frames).  One
thing to keep in mind is the difference in the driver models between 
Windows and BSD
that Bill Paul talked about the other day.  In the Windows world, the 
driver owns the
network packet memory, whereas in BSD the stack owns it (in the form of 
mbufs).  This
means that the driver can pre-allocate a contiguous slab and populate 
the descriptor rings
with it without ever having to worry about s/g fragmentation, while in 
BSD fragmentation
is a fact of life.  So it's likely yet another case of hardware being 
optimized for certain
characteristics of Windows at the expense of other operating systems.


More information about the freebsd-amd64 mailing list