Test on 10GBE Intel based network card

Hello Invernizzi,

First of all I didn't catch for what task you are tuning.

For forwarding or for transfer?

In any case you should start with minimal configuration (no netgraph
modules, no any firewall).

Choose a test similiar to your typical load.
Provide us with test result, cpu consumption and output of
netstat -s
netstat -m
debug values from driver (sysctl (?).debug)).

No improvement with kern.ipc.nmbclusters=262144 and 1.8.6 driver :<(((((

IF> ++fabrizio

Your nmbclusters is very low, you list it twice so I'm assuming the second value is what it ends up being, 32K :(
I would set it to:

kern.ipc.nmbclusters=262144

IF> kern.ipc.nmbclusters=262144

IF> Also, I thought you were using the current driver, but now it looks like you are
IF> using something fairly old,  use my latest code which is 1.8.8

IF> Invernizzi Fabrizio wrote:
IF> The limitation that you see is about the max number of packets that
64 byte:        610119 Pps
512 byte:       516917 Pps
1492 byte:      464962 Pps
IF> are the packet per second measured during tests:
IF> 64 byte:        610119 Pps
Where does this limit come from?
IF> 1492 byte:      464962 Pps

This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm
IF> per second?
IF> Yes, as you can see the maximum is 610119 Pps.
IF> Where does this limit come from?
IF> ah that's the whole point of tuning :-)
IF> there are severalpossibities:
IF> 1/ the card's interrupts are probably attache dto aonly 1 cpu,
IF> so that cpu can do no more work

IF> This seems not to be the problem. See below a top snapshot during a 64byte-long packet storm

IF> last pid:  8552;  load averages:  0.40,  0.09,  0.03                                                                                         up 0+20:36:58  09:40:29
IF> 124 processes: 13 running, 73 sleeping, 38 waiting
IF> CPU:  0.0% user,  0.0% nice, 86.3% system, 12.3% interrupt,  1.5% idle
IF> Mem: 13M Active, 329M Inact, 372M Wired, 68K Cache, 399M Buf, 7207M Free
IF> Swap: 2048M Total, 2048M Free

IF>   11 root          1 171 ki31     0K    16K RUN    3  20.2H 51.17% idle: cpu3
IF>   14 root          1 171 ki31     0K    16K RUN    0  20.2H 50.88% idle: cpu0
IF>   12 root          1 171 ki31     0K    16K RUN    2  20.2H 50.49% idle: cpu2
IF>   13 root          1 171 ki31     0K    16K RUN    1  20.2H 50.10% idle: cpu1
IF>   42 root          1 -68    -     0K    16K RUN    1  14:20 36.47% ix0 rxq
IF>   38 root          1 -68    -     0K    16K CPU0   0  14:15 36.08% ix0 rxq
IF>   44 root          1 -68    -     0K    16K CPU2   2  14:08 34.47% ix0 rxq
IF>   40 root          1 -68    -     0K    16K CPU3   3  13:42 32.37% ix0 rxq
IF> ....

It looks like the 4 rxq processes are bound to the 4 available cores with equal distribution.

2/ if more than 1 cpu is working, it may be that there is a lock in heavy contention somewhere.
IF> heavy contention somewhere.

IF> This I think is the problem. I am trying to understand how to
IF> 1- see where the heavy contention is (context switching? Some limiting setting?)
IF> 2- mitigate it

IF> there ia a lock profiling tool that right now I can't remember the name of..

IF> look it up with google :-)  FreeBSD lock profiling tool

IF> ah, first hit...

IF> http://blogs.epfl.ch/article/23832

IF> is the machine still responsive to other networks while
IF> running at maximum capacity on this network? (make sure that
IF> the other networks are on a differnet CPU (hmm I can't remember how to
IF> do that :-).

