I am doing some tests on a BSD system with a 10gbe Intel based network card and I have some doubts about the configuration since the performance I am experiencing looks (very) poor.

This is the system I am doing test on:

- HP 380 G5 (XEON X5420, CPU speed: 2.50GHz, BUS speed: 1333 MHz, L2 cache size: 12 MB, L2 cache speed: 2,5 GHz) with 1 quad-core installed.

- Network card: Silicom PE10G2i-LR - Dual Port Fiber (LR) 10 Gigabit Ethernet PCI Express Server Adapter Intel® based (chip 82598EB).
        Driver ixgbe-1.8.6

- FreeBSD 7.2-RELEASE (64 bit) with this options compiled in the kernel
        options             ZERO_COPY_SOCKETS        # Turn on zero copy send code
        options             HZ=1000
        options             BPF_JITTER

I worked on the driver settings in order to have big TX/RX rings and low interrupt rate (traffic latency is not an issue).

In order to tune up the system i started with some echo request tests.

These are the maximum Bandwidths I can send without loss:

- 64 byte packets: 312 Mbps (1,64% CPU idle)

- 512 byte packets: 2117 Mbps (1,63% CPU idle)

- 1492 byte packets: 5525 Mbps (1,93% CPU idle)

Am I right considering these figures lower than expected?
The system is just managing network traffic!

Now I have started with netgraph tests, in particular with ng_bpf and the overall system is going even worst.

I sent some HTTP traffic (597 bytes-long packets) and I configured an ng_bpf to filter TCP traffic out from the incoming interface (ix0).

If I use the ngctl msg to see counters on the ng_bpf node, I see extremely poor performance:

- Sending 96Mbps of this traffic I figured out 0.1% packet loss. This looks very strange. May be some counter bug?

- Sending 5500Mbps, the netgraph (not the network card driver) is loosing 21% of the number of sent packets. See below a snapshot of the CPU load under traffic load

CPU:  0.0% user,  0.0% nice, 87.0% system,  9.1% interrupt,  3.9% idle

Mem: 16M Active, 317M Inact, 366M Wired, 108K Cache, 399M Buf, 7222M Free

Swap: 2048M Total, 2048M Free


   12 root          1 171 ki31     0K    16K RUN    2  20.2H 68.80% idle: cpu2

   11 root          1 171 ki31     0K    16K RUN    3  20.1H 64.70% idle: cpu3

   14 root          1 171 ki31     0K    16K RUN    0  20.2H 64.26% idle: cpu0

   13 root          1 171 ki31     0K    16K RUN    1  20.2H 63.67% idle: cpu1

   38 root          1 -68    -     0K    16K CPU1   1   1:28 34.67% ix0 rxq

   40 root          1 -68    -     0K    16K CPU2   0   1:26 34.18% ix0 rxq

   34 root          1 -68    -     0K    16K CPU3   3   1:27 34.08% ix0 rxq

   36 root          1 -68    -     0K    16K RUN    2   1:26 34.08% ix0 rxq

   33 root          1 -68    -     0K    16K WAIT   3   0:40  4.05% irq260: ix0

   39 root          1 -68    -     0K    16K WAIT   2   0:41  3.96% irq263: ix0

   35 root          1 -68    -     0K    16K WAIT   0   0:39  3.66% irq261: ix0

   37 root          1 -68    -     0K    16K WAIT   1   0:42  3.47% irq262: ix0

   16 root          1 -32    -     0K    16K WAIT   0  14:53  2.49% swi4: clock sio

Am I missing something?

Does someone know some (more) system tuning to have higher traffic rate supported?

