bad throughput performance on multiple systems: Re: Fwd: Re: Disappointing packets-per-second performance results on a Dell,PE R530

Fri Mar 24 23:53:11 UTC 2017

It looks like netmap is there; however, is there a way of figuring out
if netmap is being used?

root at router1:~ # dmesg | grep cxl
cxl0: <port 0> on t5nex0
cxl0: Ethernet address: 00:07:43:2c:ac:50
cxl0: 16 txq, 8 rxq (NIC)
vcxl0: <port 0 vi 1> on cxl0
vcxl0: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl0: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
cxl1: <port 1> on t5nex0
cxl1: 16 txq, 8 rxq (NIC)
vcxl1: <port 1 vi 1> on cxl1
vcxl1: netmap queues/slots: TX 2/1023, RX 2/1024
vcxl1: 1 txq, 1 rxq (NIC); 2 txq, 2 rxq (netmap)
cxl0: link state changed to UP
vcxl0: link state changed to UP
cxl1: link state changed to UP
vcxl1: link state changed to UP

And yes, we are using UDP 64 bytes tests.

On 3/24/17 7:39 PM, Navdeep Parhar wrote:
> On 03/24/2017 16:07, Caraballo-vega, Jordan A. (GSFC-6062)[COMPUTER
> SCIENCE CORP] wrote:
>> At the time of implementing the vcxl* interfaces we get very bad
>> results.
>
> You're probably not using netmap with the vcxl interfaces, and the
> number of "normal" tx and rx queues is just 2 for these interfaces.
>
> Even if you _are_ using netmap, the hw.cxgbe.nnmtxq10g/rxq10g tunables
> don't work anymore.  Use these to control the number of queues for
> netmap:
> hw.cxgbe.nnmtxq_vi
> hw.cxgbe.nnmrxq_vi
>
> You should see a line like this in dmesg for all cxl/vcxl interfaces
> and that tells you exactly how many queues the driver configured:
> cxl0: 4 txq, 4 rxq (NIC); 4 txq, 2 rxq (TOE)
>
>>
>> packets  errs idrops      bytes    packets  errs      bytes colls drops
>>        629k  4.5k     0        66M       629k     0        66M    
>> 0     0
>>        701k  5.0k     0        74M       701k     0        74M    
>> 0     0
>>        668k  4.8k     0        70M       668k     0        70M    
>> 0     0
>>        667k  4.8k     0        70M       667k     0        70M    
>> 0     0
>>        645k  4.5k     0        68M       645k     0        68M    
>> 0     0
>>        686k  4.9k     0        72M       686k     0        72M    
>> 0     0
>>
>> And by using just the cxl* interfaces we were getting about
>>
>>              input        (Total)           output
>>     packets  errs idrops      bytes    packets  errs      bytes colls
>> drops
>>        2.8M     0  1.2M       294M       1.6M     0       171M    
>> 0     0
>>        2.8M     0  1.2M       294M       1.6M     0       171M    
>> 0     0
>>        2.8M     0  1.2M       294M       1.6M     0       171M    
>> 0     0
>>        2.8M     0  1.2M       295M       1.6M     0       172M    
>> 0     0
>>        2.8M     0  1.2M       295M       1.6M     0       171M    
>> 0     0
>>
>> These are our configurations for now. Any advice or suggestion will be
>> appreciated.
>
> What I don't understand is that you have PAUSE disabled and congestion
> drops enabled but still the number of packets coming in (whether they
> are dropped eventually or not is irrelevant here) is very low in your
> experiments.  It's almost as if the senders are backing off in the
> face of packet loss.  Are you using TCP or UDP?  Always use UDP for
> pps testing -- the senders need to be relentless.
>
> Regards,
> Navdeep
>
>>
>> /etc/rc.conf configurations
>>
>> ifconfig_cxl0="up"
>> ifconfig_cxl1="up"
>> ifconfig_vcxl0="inet 172.16.2.1/24 -tso -lro mtu 9000"
>> ifconfig_vcxl1="inet 172.16.1.1/24 -tso -lro mtu 9000"
>> gateway_enable="YES"
>>
>> /boot/loader.conf configurations
>>
>> # Chelsio Modules
>> t4fw_cfg_load="YES"
>> t5fw_cfg_load="YES"
>> if_cxgbe_load="YES"
>>
>> # rx and tx size
>> dev.cxl.0.qsize_txq=8192
>> dev.cxl.0.qsize_rxq=8192
>> dev.cxl.1.qsize_txq=8192
>> dev.cxl.1.qsize_rxq=8192
>>
>> # drop toecaps to increase queues
>> dev.t5nex.0.toecaps=0
>> dev.t5nex.0.rdmacaps=0
>> dev.t5nex.0.iscsicaps=0
>> dev.t5nex.0.fcoecaps=0
>>
>> # Controls the hardware response to congestion.  -1 disables
>> # congestion feedback and is not recommended.  0 instructs the
>> # hardware to backpressure its pipeline on congestion.  This
>> # usually results in the port emitting PAUSE frames.  1 instructs
>> # the hardware to drop frames destined for congested queues. From cxgbe
>> dev.t5nex.0.cong_drop=1
>>
>> # Saw these recomendations in Vicenzo email thread
>> hw.cxgbe.num_vis=2
>> hw.cxgbe.fl_pktshift=0
>> hw.cxgbe.toecaps_allowed=0
>> hw.cxgbe.nnmtxq10g=8
>> hw.cxgbe.nnmrxq10g=8
>>
>> /etc/sysctl.conf configurations
>>
>> # Turning off pauses
>> dev.cxl.0.pause_settings=0
>> dev.cxl.1.pause_settings=0
>> # John Jasen suggestion - March 24, 2017
>> net.isr.bindthreads=0
>> net.isr.maxthreads=24
>>
>>
>> On 3/18/17 1:28 AM, Navdeep Parhar wrote:
>>> On Fri, Mar 17, 2017 at 11:43:32PM -0400, John Jasen wrote:
>>>> On 03/17/2017 03:32 PM, Navdeep Parhar wrote:
>>>>
>>>>> On Fri, Mar 17, 2017 at 12:21 PM, John Jasen <jjasen at gmail.com>
>>>>> wrote:
>>>>>> Yes.
>>>>>> We were hopeful, initially, to be able to achieve higher packet
>>>>>> forwarding rates through either netmap-fwd or due to enhancements
>>>>>> based
>>>>>> off https://wiki.freebsd.org/ProjectsRoutingProposal
>>>>> Have you tried netmap-fwd?  I'd be interested in how that did in
>>>>> your tests.
>>>> We have. On this particular box, (11-STABLE, netmap-fwd fresh from
>>>> git)
>>>> it took about 1.7m pps in, dropped 500k, and passed about 800k.
>>>>
>>>> I'm lead to believe that vcxl interfaces may yield better results?
>>> Yes, those are the ones with native netmap support.  Any netmap based
>>> application should use the vcxl interfaces.  If you used them on the
>>> main cxl interfaces you were running netmap in emulated mode.
>>>
>>> Regards,
>>> Navdeep
>>
>