netmap and mlx4 driver status (linux)

Adrian Chadd adrian at freebsd.org
Tue Jun 2 15:40:05 UTC 2015


Hi,

You'll likely want to poke the linux mellanox driver maintainer for some help.



-adrian


On 1 June 2015 at 17:08, Blake Caldwell <caldweba at colorado.edu> wrote:
> Wondering if those experienced with other netmap drivers might be able to comment what is limiting performance of mlx4.  It seems that the reason pkt-gen is only getting 2.4Mpps with mlx4 40G is that pkt-gen is saturating a core. This clearly shouldn’t be the case as evidenced by netmap papers (14.8Mpps at 900Mz core).  As would be expected, the output from ‘perf top’ shows that sender_body and poll() are the largest userspace CPU hogs (measured in % of samples—over 24 cpus)
>
>  29.65%  [netmap]               [k] netmap_poll
>  12.47%  [mlx4_en]              [k] mlx4_netmap_txsync
>   8.69%  libc-2.19.so           [.] poll
>   6.15%  pkt-gen                [.] sender_body
>   2.26%  [kernel]               [k] local_clock
>   2.12%  [kernel]               [k] context_tracking_user_exit
>   1.87%  [kernel]               [k] select_estimate_accuracy
>   1.81%  [kernel]               [k] system_call
> ….
>   1.24%  [netmap]               [k] nm_txsync_prologue
> ….
>   0.63%  [mlx4_en]              [k] mlx4_en_arm_cq
>   0.61%  [kernel]               [k] account_user_time
>
>
> Furthermore it appears from annotating the code in pkt-gen.c with utilization, about 50% of sender_body is spent on this line while iterating through the rings:
> https://github.com/caldweba/netmap/blob/master/examples/pkt-gen.c#L1091 <https://github.com/caldweba/netmap/blob/master/examples/pkt-gen.c#L1091>
>                         if (nm_ring_empty(txring))
>
> Does this mean it is waiting for free slots most of the time and increasing from 8 rings might help?
>
> Here are the current module parameters in case they shed light on the issue. Also, netmap config kernel messages are shown below.
>
> Thanks in advance.
>
> /sys/module/netmap/parameters/adaptive_io: 0
> /sys/module/netmap/parameters/admode: 0
> /sys/module/netmap/parameters/bridge_batch: 1024
> /sys/module/netmap/parameters/buf_curr_num: 163840
> /sys/module/netmap/parameters/buf_curr_size: 2048
> /sys/module/netmap/parameters/buf_num: 163840
> /sys/module/netmap/parameters/buf_size: 2048
> /sys/module/netmap/parameters/default_pipes: 0
> /sys/module/netmap/parameters/flags: 0
> /sys/module/netmap/parameters/fwd: 0
> /sys/module/netmap/parameters/generic_mit: 100000
> /sys/module/netmap/parameters/generic_rings: 1
> /sys/module/netmap/parameters/generic_ringsize: 1024
> /sys/module/netmap/parameters/if_curr_num: 100
> /sys/module/netmap/parameters/if_curr_size: 1024
> /sys/module/netmap/parameters/if_num: 100
> /sys/module/netmap/parameters/if_size: 1024
> /sys/module/netmap/parameters/mitigate: 1
> /sys/module/netmap/parameters/mmap_unreg: 0
> /sys/module/netmap/parameters/no_pendintr: 1
> /sys/module/netmap/parameters/no_timestamp: 0
> /sys/module/netmap/parameters/priv_buf_num: 4098
> /sys/module/netmap/parameters/priv_buf_size: 2048
> /sys/module/netmap/parameters/priv_if_num: 1
> /sys/module/netmap/parameters/priv_if_size: 1024
> /sys/module/netmap/parameters/priv_ring_num: 4
> /sys/module/netmap/parameters/priv_ring_size: 20480
> /sys/module/netmap/parameters/ring_curr_num: 200
> /sys/module/netmap/parameters/ring_curr_size: 36864
> /sys/module/netmap/parameters/ring_num: 200
> /sys/module/netmap/parameters/ring_size: 36864
> /sys/module/netmap/parameters/txsync_retry: 2
> /sys/module/netmap/parameters/verbose: 0
>
>
>> On May 28, 2015, at 12:47 AM, Blake Caldwell <caldweba at colorado.edu> wrote:
>>
>> Hello,
>>
>> I have made necessary tweaks to the mlx4 patches for a successful build on Linux 3.13.11 (Ubuntu 14.04) and enabled the driver in the Linux build system. See:
>> https://github.com/caldweba/netmap.git <https://github.com/caldweba/netmap.git> for my additional commits.
>>
>> Without any core modifications to the mlx4 netmap driver, I am actually getting reduced performance, 2.5 Mpps on a 40G port. I’m interested in improving the performance of this driver, but as I’m new to netmap and even these drivers, some assistance would be welcome. As Luigi mentioned, the Mellanox developer documentation seems to be a stumbling point. Would anyone from Mellanox be able to lend some expertise?
>>
>> It would appear mlx4_netmap_txsync() is the place to focus optimization, and the comments Luigi put in will be helpful. Although I’m a little confused the on the remaining work for mlx4_netmap_tx_config (marked TODO). See https://github.com/caldweba/netmap/blob/master/LINUX/mlx4_netmap_linux.h <https://github.com/caldweba/netmap/blob/master/LINUX/mlx4_netmap_linux.h> for Luigi’s current mlx4_netmap_txsync() code.
>>
>> Below is my output from pkt-gen and from ethtool on the device.
>>
>> Best regards,
>> Blake
>>
>> —————
>> $ sudo build-apps/pkt-gen -i p2p1 -f tx -n 500111222 -l 60 -w 5
>> 060.428278 main [1649] interface is p2p1
>> 060.428770 extract_ip_range [287] range is 10.0.0.1:0 to 10.0.0.1:0
>> 060.428782 extract_ip_range [287] range is 10.1.0.1:0 to 10.1.0.1:0
>> 060.875064 main [1840] mapped 334980KB at 0x7fd1f04d5000
>> Sending on netmap:p2p1: 8 queues, 1 threads and 1 cpus.
>> 10.0.0.1 -> 10.1.0.1 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
>> 060.875151 main [1924] Sending 512 packets every  0.000000000 s
>> 060.875158 main [1926] Wait 5 secs for phy reset
>> 065.875244 main [1928] Ready...
>> 065.875276 nm_open [456] overriding ifname p2p1 ringid 0x0 flags 0x1
>> 065.914805 sender_body [1014] start, fd 4 main_fd 3
>> 065.958284 sender_body [1083] drop copy
>> 066.915788 main_thread [1446] 2468560 pps (2471088 pkts in 1001024 usec)
>> 067.916827 main_thread [1446] 2476292 pps (2478865 pkts in 1001039 usec)
>> 068.917815 main_thread [1446] 2476261 pps (2478708 pkts in 1000988 usec)
>> 069.918864 main_thread [1446] 2476232 pps (2478827 pkts in 1001048 usec)
>> 070.919902 main_thread [1446] 2476031 pps (2478604 pkts in 1001039 usec)
>> 071.920920 main_thread [1446] 2476304 pps (2478825 pkts in 1001018 usec)
>> 072.921896 main_thread [1446] 2476349 pps (2478766 pkts in 1000976 usec)
>> 073.922948 main_thread [1446] 2476327 pps (2478932 pkts in 1001052 usec)
>> 074.923924 main_thread [1446] 2476301 pps (2478715 pkts in 1000975 usec)
>> 075.924903 main_thread [1446] 2476257 pps (2478681 pkts in 1000979 usec)
>> 076.925918 main_thread [1446] 2476195 pps (2478708 pkts in 1001015 usec)
>> 077.926970 main_thread [1446] 2476242 pps (2478849 pkts in 1001053 usec)
>>
>> dmesg:
>> [52591.017469] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)
>> [52591.017621] mlx4_en 0000:04:00.0: registered PHC clock
>> [52591.017780] mlx4_en 0000:04:00.0: Activating port:1
>> [52591.023552] mlx4_en: eth0: Using 192 TX rings
>> [52591.023554] mlx4_en: eth0: Using 8 RX rings
>> [52591.023556] mlx4_en: eth0:   frag:0 - size:1526 prefix:0 align:0 stride:1536
>> [52591.040585] mlx4_en: eth0: Initializing port
>> [52591.040732] 779.121252 [2720] netmap_attach             success for eth0 tx 8/512 rx 8/1024 queues/slots
>> [52591.060580] mlx4_en 0000:04:00.0: Activating port:2
>> [52591.068337] mlx4_en: eth1: Using 192 TX rings
>> [52591.068340] mlx4_en: eth1: Using 8 RX rings
>> [52591.068342] mlx4_en: eth1:   frag:0 - size:1526 prefix:0 align:0 stride:1536
>> [52591.085696] mlx4_en: eth1: Initializing port
>> [52591.085867] 779.166352 [2720] netmap_attach             success for eth1 tx 8/512 rx 8/1024 queues/slots
>> [52591.960730] mlx4_en: eth0: Link Up
>> [52593.029536] systemd-udevd[50993]: renamed network interface eth0 to p2p1
>> [52593.061736] systemd-udevd[50996]: renamed network interface eth1 to rename28
>> [52624.680481] mlx4_en: p2p1:   frag:0 - size:1526 prefix:0 align:0 stride:1536
>> [52624.834109] 812.888289 [ 473] mlx4_en_tx_irq            XXXXXXXXX  tx_irq 0 unexpected, ignoring
>>
>> [55436.322304] 622.179577 [ 665] mlx4_netmap_config        using only 8 out of 192 tx queues
>> [55436.339688] 622.196947 [ 672] mlx4_netmap_config        txr 8 txd 512 bufsize 32768 -- rxr 8 rxd 1024 act 1024 bufsize 16384
>> [55436.361877] 622.219119 [ 124] mlx4_netmap_reg           setting netmap mode for eth0 to ON
>> [55436.379345] 622.236575 [ 127] mlx4_netmap_reg           unloading eth0
>> [55436.485781] 622.342926 [ 163] mlx4_netmap_reg           loading eth0
>> [55436.501124] mlx4_en: p2p1:   frag:0 - size:1526 prefix:0 align:0 stride:1536
>> [55436.517514] 622.374635 [ 628] mlx4_netmap_rx_config     stride 16 possible frags 1 descsize 0 DS_SIZE 16
>> [55436.536462] 622.393570 [ 648] mlx4_netmap_rx_config     ring 0 done
>> [55436.551746] 622.408842 [ 648] mlx4_netmap_rx_config     ring 1 done
>> [55436.567111] 622.424194 [ 628] mlx4_netmap_rx_config     stride 16 possible frags 1 descsize 0 DS_SIZE 16
>> [55436.586261] 622.443330 [ 648] mlx4_netmap_rx_config     ring 2 done
>> [55436.601844] 622.458900 [ 648] mlx4_netmap_rx_config     ring 3 done
>> [55436.617525] 622.474569 [ 648] mlx4_netmap_rx_config     ring 4 done
>> [55436.633057] 622.490089 [ 648] mlx4_netmap_rx_config     ring 5 done
>> [55436.648376] 622.505396 [ 648] mlx4_netmap_rx_config     ring 6 done
>> [55436.780501] 622.637414 [ 165] mlx4_netmap_reg           start_port returns 0
>> [55436.796403] mlx4_en: p2p1: Link Down
>> [55437.755281] mlx4_en: p2p1: Link Up
>>
>>
>> $ ethtool p2p1
>> Settings for p2p1:
>>       Supported ports: [ TP ]
>>       Supported link modes:   10000baseT/Full
>>       Supported pause frame use: No
>>       Supports auto-negotiation: No
>>       Advertised link modes:  10000baseT/Full
>>       Advertised pause frame use: No
>>       Advertised auto-negotiation: No
>>       Speed: 40000Mb/s
>>       Duplex: Full
>>       Port: Twisted Pair
>>       PHYAD: 0
>>       Transceiver: internal
>>       Auto-negotiation: off
>>       MDI-X: Unknown
>> Cannot get wake-on-lan settings: Operation not permitted
>>       Current message level: 0x00000014 (20)
>>                              link ifdown
>>       Link detected: yes
>>
>> $ ethtool -i p2p1
>> driver: mlx4_en
>> version: 2.2-1 (Feb 2014)
>> firmware-version: 2.30.3200
>> bus-info: 0000:04:00.0
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: no
>> supports-register-dump: no
>> supports-priv-flags: yes
>>
>>  ethtool -g p2p1
>> Ring parameters for p2p1:
>> Pre-set maximums:
>> RX:           8192
>> RX Mini:      0
>> RX Jumbo:     0
>> TX:           8192
>> Current hardware settings:
>> RX:           1024
>> RX Mini:      0
>> RX Jumbo:     0
>> TX:           512
>>
>> Coalesce parameters for p2p1:
>> Adaptive RX: on  TX: off
>> stats-block-usecs: 0
>> sample-interval: 0
>> pkt-rate-low: 400000
>> pkt-rate-high: 450000
>>
>> rx-usecs: 16
>> rx-frames: 44
>> rx-usecs-irq: 0
>> rx-frames-irq: 0
>>
>> tx-usecs: 16
>> tx-frames: 16
>> tx-usecs-irq: 0
>> tx-frames-irq: 0
>>
>> rx-usecs-low: 0
>> rx-frame-low: 0
>> tx-usecs-low: 0
>> tx-frame-low: 0
>>
>> rx-usecs-high: 128
>> rx-frame-high: 0
>> tx-usecs-high: 0
>> tx-frame-high: 0
>>
>> $ sudo ethtool -k p2p1
>> Features for p2p1:
>> rx-checksumming: on
>> tx-checksumming: on
>>       tx-checksum-ipv4: on
>>       tx-checksum-ip-generic: off [fixed]
>>       tx-checksum-ipv6: on
>>       tx-checksum-fcoe-crc: off [fixed]
>>       tx-checksum-sctp: off [fixed]
>> scatter-gather: on
>>       tx-scatter-gather: on
>>       tx-scatter-gather-fraglist: off [fixed]
>> tcp-segmentation-offload: on
>>       tx-tcp-segmentation: on
>>       tx-tcp-ecn-segmentation: off [fixed]
>>       tx-tcp6-segmentation: on
>> udp-fragmentation-offload: off [fixed]
>> generic-segmentation-offload: on
>> generic-receive-offload: on
>> large-receive-offload: off [fixed]
>> rx-vlan-offload: on
>> tx-vlan-offload: on
>> ntuple-filters: off [fixed]
>> receive-hashing: on
>> highdma: on [fixed]
>> rx-vlan-filter: on [fixed]
>> vlan-challenged: off [fixed]
>> tx-lockless: off [fixed]
>> netns-local: off [fixed]
>> tx-gso-robust: off [fixed]
>> tx-fcoe-segmentation: off [fixed]
>> tx-gre-segmentation: off [fixed]
>> tx-ipip-segmentation: off [fixed]
>> tx-sit-segmentation: off [fixed]
>> tx-udp_tnl-segmentation: off [fixed]
>> tx-mpls-segmentation: off [fixed]
>> fcoe-mtu: off [fixed]
>> tx-nocache-copy: on
>> loopback: off
>> rx-fcs: off [fixed]
>> rx-all: off [fixed]
>> tx-vlan-stag-hw-insert: off [fixed]
>> rx-vlan-stag-hw-parse: off [fixed]
>> rx-vlan-stag-filter: off [fixed]
>> l2-fwd-offload: off [fixed]
>>
>>
>>> On May 20, 2015, at 9:18 AM, Luigi Rizzo <rizzo at iet.unipi.it <mailto:rizzo at iet.unipi.it>> wrote:
>>>
>>> hi all,
>>>
>>> the mlx4 netmap patch (for linux only) was something i did long
>>> ago when i had some mellanox hardware available, but no documentation
>>> so i had to resort to interpreting what the linux driver did.
>>>
>>> At the time i had the following performance (on PCIe v2 bus):
>>>
>>> 10G ports: tx/rx at about  7 Mpps with 64 byte packets
>>>      could saturate the link with 192 or 256 byte packets
>>>
>>> 40G ports: tx/rx at about 11 Mpps with 64 byte packets
>>>      max 28 Gbit/s even with 1500 byte frames
>>>
>>> I don't know if the limited performance was due to bus,
>>> firmware or lack of documentation, anyways this is not
>>> something i can or want to deal with.
>>>
>>> My understanding is that Mellanox does not release programming
>>> documentation, so the only way to have native netmap support
>>> for that card would be to have Mellanox work on that and
>>> provide a suitable patch.
>>>
>>> I do not expect more than a week's work (the typical extra
>>> code in each driver is about 500 lines, and very simple)
>>> for someone with access to documentation. Also, the patch
>>> for FreeBSD and Linux is typically very similar so once we
>>> have a driver for one, the other would be trivial.
>>>
>>> It would be of course great to add Mellanox to the list of
>>> devices with native netmap support, together with Chelsio
>>> and Intel.
>>>
>>> Perhaps Hans (who may have contacts) can talk to the right
>>> people and figure out. On my side, I am happy to give directions
>>> on what needs to be done and import any patch that should
>>> be made available.
>>>
>>> cheers
>>> luigi
>>>
>>> On Wed, May 20, 2015 at 4:50 PM, Hans Petter Selasky <hps at selasky.org <mailto:hps at selasky.org>> wrote:
>>> On 05/20/15 16:13, Blake Caldwell wrote:
>>> Hello,
>>>
>>> I noticed that the mlx4_en patch for netmap is LINUX/wip-patches, so they are not enabled in the normal build process. I’m curious about the status of mlx4 support?
>>>
>>> If additional work to the patches is needed, any details as to what the issues were.
>>>
>>> Any info would be great! Thanks in advance!
>>>
>>>
>>> Hi Blake,
>>>
>>> The MLX4 driver is being actively maintained in -stable and -current. Regarding netmap support for the FreeBSD MLX4 en driver, I'm not sure. Maybe Oded knows, CC'ed? Do you have a link for the patch you are referring?
>>>
>>> This there any particular use-case you are interested in?
>>>
>>> --HPS
>>>
>>>
>>> _______________________________________________
>>> freebsd-net at freebsd.org <mailto:freebsd-net at freebsd.org> mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net <http://lists.freebsd.org/mailman/listinfo/freebsd-net>
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org <mailto:freebsd-net-unsubscribe at freebsd.org>"
>>>
>>>
>>>
>>> --
>>> -----------------------------------------+-------------------------------
>>>  Prof. Luigi RIZZO, rizzo at iet.unipi.it <mailto:rizzo at iet.unipi.it>  . Dip. di Ing. dell'Informazione
>>>  http://www.iet.unipi.it/~luigi/ <http://www.iet.unipi.it/~luigi/>        . Universita` di Pisa
>>>  TEL      +39-050-2217533               . via Diotisalvi 2
>>>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>>> -----------------------------------------+-------------------------------
>>
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"


More information about the freebsd-net mailing list