Disappointing packets-per-second performance results on a Dell, PE R530
olivier at freebsd.org
Thu Feb 2 01:09:26 UTC 2017
On Wed, Feb 1, 2017 at 1:00 AM, Jordan Caraballo <
jordancaraballo87 at gmail.com> wrote:
> Hi Oliver, my bad, I missed that one. Here is the info:
> * Switch with 48x 10G ports and 12x 40G ports was used
> * (48) 10G connected nodes were used.
> * (24) nodes on each side of the firewall
> * Packet per second (PPS) tests were run using 'iperf'
> * Bandwidth tests were run using 'nuttcp'
> * Parallelization was handled by using pdsh
> * Each of the 24 sending nodes ran either:
> iperf3 -c "<target>" -u -A 5 -l 512 -b 0 -t "<duration>" -J
> nuttcp -fparse -l 128k -w1m -T "<duration>" "<target>"
Your current performance (1.5Mpps) seems to indicate that only one,
perhaps 2 cores maximum are used.
Can you confirm that during your iperf bench there are 24 distinct flows
simultaneously (different source/destination IP) ?
source-IP-1 -> target-IP-1
source-IP-2 -> target-IP-2
... until source-IP-24 -> target-IP-24
On your dual CPU with 18 cores, chelsio drivers should create per each port:
- 8 RX queues (rxq NIC)
- 16 TX queues (txq NIC)
Can you check on /var/run/dmesg.boot that you've got something like this:
t5nex0: <Chelsio T540-CR> mem 0xfb780000-0xfb7fffff,0xfa0000
00-0xfaffffff,0xf9ff0000-0xf9ff1fff irq 40 at device 0.4 numa-domain 0 on
cxl0: <port 0> numa-domain 0 on t5nex0
cxl0: Ethernet address: 00:07:43:2e:e4:70
cxl0: 16 txq, 8 rxq (NIC); 8 txq, 2 rxq (TOE)
=> Notice the 16 txq, 8 rxq (NIC) lines
Now, like Slawa says, we should see the 8 IRQ assigned to these 8 rxq and
After your bench, output of a "vmstat -ia | grep t5nex0:0a" should display
a minimum of 8 lines with equally distributed number like this example:
[root at hp]~# vmstat -ia | grep t5nex0:0a
irq292: t5nex0:0a0 37 0
irq293: t5nex0:0a1 288498 629
irq294: t5nex0:0a2 225410 492
irq295: t5nex0:0a3 306227 668
irq296: t5nex0:0a4 282679 617
irq297: t5nex0:0a5 313143 683
irq298: t5nex0:0a6 318727 695
irq299: t5nex0:0a7 308669 673
(my example seems not perfect because queue0 seems under-utilized, but
you've got the idea)
More information about the freebsd-net