Fwd: Just joined the infiniband club

John Fleming john at spikefishsolutions.com
Fri Sep 13 18:37:05 UTC 2019

Top post I know, but i meant to send this to freebsd-infiniband not stable

> On 2019-09-07 19:00, John Fleming wrote:
> > Hi all, i've recently joined the club. I have two Dell R720s connected
> > directly to each other. The card is a connectx-4. I was having a lot
> > of problem with network drops. Where i'm at now is i'm running
> > FreeBSD12-Stable as of a week ago and cards have been cross flashed
> > with OEM firmware (these are lenovo i think) and i'm no longer getting
> > network drops. This box is basically my storage server. Its exporting
> > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box
> > which is running GNS3 for a lab.
> >
> > So many questions.. sorry if this is a bit rambly!
> >
> >  From what I understand this card is really 4 x 25 gig lanes. If i
> > understand that correctly then 1 data transfer should be able to do at
> > max 25 gig (best case) correct?
> >
> > I'm not getting what the difference between connected mode and
> > datagram mode is. Does this have anything to do with the card
> > operating in infiniband mode vs ethernet mode? FreeBSD is using the
> > modules compiled in connected mode with shell script (which is really
> > a bash script not a sh script) from freebsd-infiniband page.
> Nothing to do with Ethernet...
> Google turned up a brief explanation here:
> https://wiki.archlinux.org/index.php/InfiniBand
I still don't get why I would want to use one of the the other or why
the option is there but it doesn't matter.
After firmware upgrade and moving to FreeBSD stable (unsure which is
triggering this) i can no longer
set connected mode on linux. There are a lot of posts that say you
have to diabled enhanced iboip mode
via a modules.conf setting but the driver doesn't have any idea what
that is. echoing connnected to mode file
throws a write error. I poked around in linux source but like i'm not
even level 1 fighter on C. i'm like generic NPC
that says hi at the gates.

> Those are my module building scripts on the wiki.  What bash extensions
> did you see?

Isn't this a bash..ism? When i run it inside sh it throws a fit. No
worries, i just edited loaded.conf


> >
> > Linux box complains if mtu is over 2044 with expect mulitcast drops or
> > something like that so mtu on both boxes is set to 2044.
> >
> > Everything i'm reading makes it sound like there is no RDMA support in
> > FreeBSD or maybe that was no NFS RDMA support. Is that correct?
> RDMA is inherent in Infiniband AFAIK.  Last I checked, there was no
> support in FreeBSD for NFS over RDMA, but news travels slowly in this
> group so a little digging might prove otherwise.
> >
> > So far it seems like these cards struggle to full 10 gig pipe. Using
> > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces
> > aren't showing drops on either end. Doesn't seem to matter if i do 1,
> > 2 or 4 threads on iperf.
> You'll need both ends in connected mode with a fairly large MTU to get
> good throughput.  CentOS defaults to 64k, but FreeBSD is unstable at
> that size last I checked.  I got good results with 16k.
> My FreeBSD ZFS NFS server performed comparably to the CentOS servers,
> with some buffer space errors causing the interface to shut down (under
> the same loads that caused CentOS servers to lock up completely).
> Someone mentioned that this buffer space bug has been fixed, but I no
> longer have a way to test it.
> Best,
>      Jason
> --
> Earth is a beta site.

So .. i ended up switch to linux mode via mlxconfig -d PCID set
Oh i also set MTU to 9000.

After that.. the flood gates opened massively.

root at R720-Storage:~ # iperf -c -P4
Client connecting to, TCP port 5001
TCP window size: 1.01 MByte (default)
[  6] local port 62256 connected with port 5001
[  3] local port 51842 connected with port 5001
[  4] local port 53680 connected with port 5001
[  5] local port 33455 connected with port 5001
[ ID] Interval       Transfer     Bandwidth
[  6]  0.0-10.0 sec  24.6 GBytes  21.1 Gbits/sec
[  3]  0.0-10.0 sec  23.8 GBytes  20.5 Gbits/sec
[  4]  0.0-10.0 sec  33.4 GBytes  28.7 Gbits/sec
[  5]  0.0-10.0 sec  32.9 GBytes  28.3 Gbits/sec
[SUM]  0.0-10.0 sec   115 GBytes  98.5 Gbits/sec
root at R720-Storage:~ #
11:56 AM
root at compute720:~# iperf -c -P4
Client connecting to, TCP port 5001
TCP window size:  325 KByte (default)
[  5] local port 50022 connected with port 5001
[  3] local port 50026 connected with port 5001
[  6] local port 50024 connected with port 5001
[  4] local port 50020 connected with port 5001
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec  27.4 GBytes  23.5 Gbits/sec
[  3]  0.0-10.0 sec  26.2 GBytes  22.5 Gbits/sec
[  6]  0.0-10.0 sec  26.8 GBytes  23.1 Gbits/sec
[  4]  0.0-10.0 sec  26.0 GBytes  22.3 Gbits/sec
[SUM]  0.0-10.0 sec   106 GBytes  91.4 Gbits/sec
root at compute720:~#

I should point out before doing this while running in IB mode with
datagram mode i disabled SMT and set the power profile to performance
on box boxes. This moved me up to 10-12 gig/sec, nothing like the
change to ethernet which i can now fill the pipe from the looks of it.

Also note a single connection doesn't do more then 25ishgig/sec.

Back to SATA being the bottle neck but at least if its coming out of
the cache there should be more then enough network IO.

Oh one last thing, i thought i read somewhere that you needed to have
a switch to do ethernet mode. This doesn't seem to be the case. I
haven't shutdown opensm yet but i'll try that later as i'm assuming i
no longer need that.


More information about the freebsd-infiniband mailing list