serious networking (em) performance (ggate and NFS) problem

Fri Nov 19 04:18:50 PST 2004

Am Donnerstag, 18. November 2004 13:27 schrieb Robert Watson:
> On Wed, 17 Nov 2004, Emanuel Strobl wrote:
> > I really love 5.3 in many ways but here're some unbelievable transfer

First, thanks a lot to all of you paying attention to my problem again.
I'll use this as a cumulative answer to the many postings of you, first 
answering Roberts questions and at the bottom those of the others.

I changed cables and couldn't reproduce that bad results so I changed cables 
back but also cannot reproduce them, especially the ggate write, formerly 
with 2,6MB/s now performs at 15MB/s, but I haven't done any polling tests 
anymore, just interrupt driven, since Matt explained that em doesn't benefit 
of polling in any way.

Results don't indicate a serious problem now but are still about a third of 
what I'd expected with my hardware. Do I really need Gigahertz Class CPUs to 
transfer 30MB/s over GbE?

>
> I think the first thing you want to do is to try and determine whether the
> problem is a link layer problem, network layer problem, or application
> (file sharing) layer problem.  Here's where I'd start looking:
>
> (1) I'd first off check that there wasn't a serious interrupt problem on
>     the box, which is often triggered by ACPI problems.  Get the box to be
>     as idle as possible, and then use vmstat -i or stat -vmstat to see if
>     anything is spewing interrupts.

Everything is fine

>
> (2) Confirm that your hardware is capable of the desired rates: typically
>     this involves looking at whether you have a decent card (most if_em
>     cards are decent), whether it's 32-bit or 64-bit PCI, and so on.  For
>     unidirectional send on 32-bit PCI, be aware that it is not possible to
>     achieve gigabit performance because the PCI bus isn't fast enough, for
>     example.

I'm aware that my 32bit/33MHz PCI bus is a "bottleneck", but I saw almost 
80MByte/s running over the bus to my test-stripe-set (over the HPT372). So 
I'm pretty sure the system is good for 40MB/s ober the GbE line, which was 
sufficient for me.

>
> (3) Next, I'd use a tool like netperf (see ports collection) to establish
>     three characteristics: round trip latency from user space to user
>     space (UDP_RR), TCP throughput (TCP_STREAM), and large packet
>     throughput (UDP_STREAM).  With decent boxes on 5.3, you should have no
>     trouble at all maxing out a single gig-e with if_em, assuming all is
>     working well hardware wise and there's no software problem specific to
>     your configuration.

Please find the results on http://www.schmalzbauer.de/document.php?id=21
There is also a lot of additional information and more test results

>
> (4) Note that router latency (and even switch latency) can have a
>     substantial impact on gigabit performance, even with no packet loss,
>     in part due to stuff like ethernet flow control.  You may want to put
>     the two boxes back-to-back for testing purposes.
>

I was aware of that and because of lacking a GbE switch anyway I decided to 
use a simple cable ;)

> (5) Next, I'd measure CPU consumption on the end box -- in particular, use
>     top -S and systat -vmstat 1 to compare the idle condition of the
>     system and the system under load.
>

I additionally added these values to the netperf results.

> If you determine there is a link layer or IP layer problem, we can start
> digging into things like the error statistics in the card, negotiation
> issues, etc.  If not, you want to move up the stack to try and
> characterize where it is you're hitting the performance issue.

Am Donnerstag, 18. November 2004 17:53 schrieb M. Warner Losh:
> In message: <Pine.NEB.3.96L.1041118121834.66045B-100000 at fledge.watson.org>
>
>             Robert Watson <rwatson at freebsd.org> writes:
> : (1) I'd first off check that there wasn't a serious interrupt problem on
> :     the box, which is often triggered by ACPI problems.  Get the box to
> : be as idle as possible, and then use vmstat -i or stat -vmstat to see if
> : anything is spewing interrupts.
>
> Also, make sure that you aren't sharing interrupts between
> GIANT-LOCKED and non-giant-locked cards.  This might be exposing bugs
> in the network layer that debug.mpsafenet=0 might correct.  Just
> noticed that our setup here has that setup, so I'll be looking into
> that area of things.

As you can see on the link above, no shared IRQs

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-current/attachments/20041119/689ac48d/attachment.bin