Major performance hit with ToS setting

Kevin Oberman kob6558 at gmail.com
Thu May 31 03:33:27 UTC 2012


On Fri, May 25, 2012 at 6:27 AM, Andrew Gallatin <gallatin at cs.duke.edu> wrote:
> On 05/24/12 18:55, Kevin Oberman wrote:
>
>>
>> This is,of course, on a 10G interface. On 7.3 there is little
>
>
> Hi Kevin,
>
>
> What you're seeing looks almost like a checksum is bad, or
> there is some other packet damage.  Do you see any
> error counters increasing if you run netstat -s before
> and after the test & compare the results?
>
> Thinking that, perhaps, this was a bug in my mxge(4), I attempted
> to reproduce it this morning between  8.3 and 9.0 boxes and
> failed to see the bad behavior..
>
> % nuttcp-6.1.2 -c32t -t diablo1-m < /dev/zero
>  9161.7500 MB /  10.21 sec = 7526.5792 Mbps 53 %TX 97 %RX 0 host-retrans
> 0.11 msRTT
> % nuttcp-6.1.2  -t diablo1-m < /dev/zero
>  9140.6180 MB /  10.21 sec = 7509.8270 Mbps 53 %TX 97 %RX 0 host-retrans
> 0.11 msRTT
>
>
> However, I don't have any 8.2-r box handy, so I cannot
> exactly repro your experiment...

Drew and Bjorn,

At this point the flying fickle finger of fate (oops, just dated
myself) is pointing to a bug in the CUBIC congestion control, which we
run. But its really weird in several ways.

I built another system from the same source files and it works fine,
unlike all of the existing systems. I need to confirm that all systems
have identical hardware including the Myricom firmware. I suspect some
edge case is biting only in unusual cases.

I used SIFTR at the suggestion of Lawrence Stewart who headed the
project to bring plugable congestion algorithms to FreeBSD and found
really odd congestion behavior. First, I do see a triple ACK, but the
congestion window suddenly drops from 73K to 8K. If I understand
CUBIC, it should half the congestion window, not what is happening..
It then increases slowly (in slow start) to 82K. while the slow-start
bytes are INCREASING, the congestion window again goes to 8K while the
SS size moves from 36K up to 52K. It just continues to bound wildly
between 8K (always the low point) and between 64k and 82K. The swings
start at 83K and, over the first few seconds the peaks drop to about
64K.

I am trying to think of any way that anything other then the CC
algorithm could do this, but have not to this point. I will try
installing Hamilton and see how it works. On the other hand, how could
changing the ToS bits trigger this behavior?

I have sent all of my data to Lawrence Stewart and I expect to here
from him soon, but I'd appreciate it if you can provide any other idea
on what could cause this.

I might also mention that about 4 years ago when I was testing 10G
cards I saw something similar (using tcptrace) when testing between a
Myricom card and a Chelsio, but that is a pretty vague daqta point and
I no longer have the traces to examine.

Again, if you want to look at network performance issues, SIFTR is an
awesome tool.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6558 at gmail.com


More information about the freebsd-net mailing list