serious networking (em) performance (ggate and NFS) problem

Sean McNeil sean at mcneil.com
Mon Nov 22 14:07:51 PST 2004


Hi John-Mark,

On Mon, 2004-11-22 at 13:31 -0800, John-Mark Gurney wrote:
> Sean McNeil wrote this message on Mon, Nov 22, 2004 at 12:14 -0800:
> > On Mon, 2004-11-22 at 11:34 +0000, Robert Watson wrote:
> > > On Sun, 21 Nov 2004, Sean McNeil wrote:
> > > 
> > > > I have to disagree.  Packet loss is likely according to some of my
> > > > tests.  With the re driver, no change except placing a 100BT setup with
> > > > no packet loss to a gigE setup (both linksys switches) will cause
> > > > serious packet loss at 20Mbps data rates.  I have discovered the only
> > > > way to get good performance with no packet loss was to
> > > > 
> > > > 1) Remove interrupt moderation
> > > > 2) defrag each mbuf that comes in to the driver.
> > > 
> > > Sounds like you're bumping into a queue limit that is made worse by
> > > interrupting less frequently, resulting in bursts of packets that are
> > > relatively large, rather than a trickle of packets at a higher rate.
> > > Perhaps a limit on the number of outstanding descriptors in the driver or
> > > hardware and/or a limit in the netisr/ifqueue queue depth.  You might try
> > > changing the default IFQ_MAXLEN from 50 to 128 to increase the size of the
> > > ifnet and netisr queues.  You could also try setting net.isr.enable=1 to
> > > enable direct dispatch, which in the in-bound direction would reduce the
> > > number of context switches and queueing.  It sounds like the device driver
> > > has a limit of 256 receive and transmit descriptors, which one supposes is
> > > probably derived from the hardware limit, but I have no documentation on
> > > hand so can't confirm that.
> > 
> > I've tried bumping IFQ_MAXLEN and it made no difference.  I could rerun
> 
> And the default for if_re is RL_IFQ_MAXLEN which is already 512...  As
> is mentioned below, the card can do 64 segments (which usually means 32
> packets since each packet usually has a header + payload in seperate
> packets)...

It sounds like you believe this is an if_re-only problem.  I had the
feeling that the if_em driver performance problems were related in some
way.  I noticed that if_em does not do anything with m_defrag and
thought it might be a little more than coincidence.

> > this test to be 100% certain I suppose.  It was done a while back.  I
> > haven't tried net.isr.enable=1, but packet loss is in the transmission
> > direction.  The device driver has been modified to have 1024 transmit
> > and receive descriptors each as that is the hardware limitation.  That
> > didn't matter either.  With 1024 descriptors I still lost packets
> > without the m_defrag.
> 
> hmmm...  you know, I wonder if this is a problem with the if_re not
> pulling enough data from memory before starting the transmit...  Though
> we currently have it set for unlimited... so, that doesn't seem like it
> would be it..

Right.  Plus it now has 1024 descriptors on my machine and, like I said,
made little difference.

> > The most difficult thing for me to understand is:  if this is some sort
> > of resource limitation why will it work with a slower phy layer
> > perfectly and not with the gigE?  The only thing I could think of was
> > that the old driver was doing m_defrag calls when it filled the transmit
> > descriptor queues up to a certain point.  Understanding the effects of
> > m_defrag would be helpful in figuring this out I suppose.
> 
> maybe the chip just can't keep the transmit fifo loaded at the higher
> speeds...  is it possible vls is doing a writev for multisegmented UDP
> packet?   I'll have to look at this again...

I suppose.  As I understand it, though, it should be sending out
1316-byte data packets at a metered pace.  Also, wouldn't it behave the
same for 100BT vs. gigE?  Shouldn't I see packet loss with 100BT if this
is the case?

> > > It would be interesting on the send and receive sides to inspect the
> > > counters for drops at various points in the network stack; i.e., are we
> > > dropping packets at the ifq handoff because we're overfilling the
> > > descriptors in the driver, are packets dropped on the inbound path going
> > > into the netisr due to over-filling before the netisr is scheduled, etc. 
> > > And, it's probably interesting to look at stats on filling the socket
> > > buffers for the same reason: if bursts of packets come up the stack, the
> > > socket buffers could well be being over-filled before the user thread can
> > > run.
> > 
> > Yes, this would be very interesting and should point out the problem.  I
> > would do such a thing if I had enough knowledge of the network pathways.
> > Alas, I am very green in this area.  The receive side has no issues,
> > though, so I would focus on transmit counters (with assistance).
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: This is a digitally signed message part
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20041122/5ac92bc2/attachment.bin


More information about the freebsd-current mailing list