realtek performance (was Re: good ATI chipset results)

Sean McNeil sean at mcneil.com
Thu Oct 13 13:25:42 PDT 2005


On Thu, 2005-10-13 at 16:17 -0400, John Baldwin wrote:
> On Thursday 13 October 2005 03:46 pm, Scott Long wrote:
> > John Baldwin wrote:
> > > On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote:
> > >>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote:
> > >>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote:
> > >>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote:
> > >>>>>Havent really seen anyone else use this board, but I have had good
> > >>>>>luck with it so far
> > >>>>>
> > >>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50
> > >>>>>6&Me nuID=90&LanID=0
> > >>>>>
> > >>>>>Its a micro ATX formfactor with built in video and the onboard NIC is
> > >>>>>a realtek.  (Although its not the fastest NIC, its driver is stable
> > >>>>>and mature-- especially compared to the headaches people seem to have
> > >>>>>with the NVIDIA NICs.)
> > >>>>
> > >>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet?
> > >>>>
> > >>>>For those interested, here are some changes I always use to increase
> > >>>>the performance of the above NIC.  With these mods, I can stream over
> > >>>>20 MBps video multicast and do other stuff over the network without
> > >>>>issues. Without the changes, xmit is horrible with severe UDP packet
> > >>>>loss.
> > >>>
> > >>>So, I see two changes.  One is to up the number of descriptors from 32
> > >>> rx and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx
> > >>> on other modules.  The other thing is that you seem to pessimize TX
> > >>> performance by always forcing the send packets to be coalesced into one
> > >>> mbuf (which requires doing an alloc and then copying all of the data)
> > >>> instead of making use of scatter/gatter for sending packets.  Do you
> > >>> need both changes or do just the higher descriptor counts make the
> > >>> difference?
> > >>
> > >>Actually, I've found that the higher descriptor counts do not make a
> > >>noticeable difference.  The only thing that mattered was to eliminate
> > >>the scatter/gather of sending packets.  I can't remember why I left the
> > >>descriptor increase in there.  I think it was to get the best use out of
> > >>the hardware.
> > >
> > > Hmm, odd.  Scott, do you have any ideas why m_defrag() plus one
> > > descriptor would be faster than s/g dma for re(4)?
> >
> > There are two things that I would consider.  First is that
> > bus_dmamap_load_mbuf_sg()
> > should be use, as that cuts out some indirection (and thus latency) in
> > the code.  Second
> > is that not all DMA engines are created equal, and I honestly wouldn't
> > expect a whole lot
> > out of Realtek given the price point of this chip.  It might be
> > optimized only for operating
> > on only a single S/G element, for example.  Maybe it's really slow at
> > pre-fetching s/g
> > elements, or maybe it has some sort of a stall after each DMA sement
> > transfer while it
> > restarts a state machine.  I've seen evidence in other hardware that
> > only one S/G element
> > should be used even though there are slots for 2 (or 3 in the case of 9k
> > jumbo frames).  One
> > thing to keep in mind is the difference in the driver models between
> > Windows and BSD
> > that Bill Paul talked about the other day.  In the Windows world, the
> > driver owns the
> > network packet memory, whereas in BSD the stack owns it (in the form of
> > mbufs).  This
> > means that the driver can pre-allocate a contiguous slab and populate
> > the descriptor rings
> > with it without ever having to worry about s/g fragmentation, while in
> > BSD fragmentation
> > is a fact of life.  So it's likely yet another case of hardware being
> > optimized for certain
> > characteristics of Windows at the expense of other operating systems.
> 
> Ok.  Sean, do you think you can trim the patch down to just the m_defrag() 
> changes and test that to make sure that is all that is needed?

Certainly.  I can even write a small comment that specifies it is most
likely some sort of hardware limitation of DMAing fragmented data :)

I'm knee deep in something else at the moment, so it might be a day or
two.  I'll try to get it to you sooner.

Sean




More information about the freebsd-amd64 mailing list