Just joined the infiniband club

John Fleming john at spikefishsolutions.com
Fri Sep 13 19:13:39 UTC 2019


And of course I meant ethernet mode not linux mode.


On Fri, Sep 13, 2019 at 2:36 PM John Fleming
<john at spikefishsolutions.com> wrote:
>
> Top post I know, but i meant to send this to freebsd-infiniband not stable
>
> >
> > On 2019-09-07 19:00, John Fleming wrote:
> > > Hi all, i've recently joined the club. I have two Dell R720s connected
> > > directly to each other. The card is a connectx-4. I was having a lot
> > > of problem with network drops. Where i'm at now is i'm running
> > > FreeBSD12-Stable as of a week ago and cards have been cross flashed
> > > with OEM firmware (these are lenovo i think) and i'm no longer getting
> > > network drops. This box is basically my storage server. Its exporting
> > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box
> > > which is running GNS3 for a lab.
> > >
> > > So many questions.. sorry if this is a bit rambly!
> > >
> > >  From what I understand this card is really 4 x 25 gig lanes. If i
> > > understand that correctly then 1 data transfer should be able to do at
> > > max 25 gig (best case) correct?
> > >
> > > I'm not getting what the difference between connected mode and
> > > datagram mode is. Does this have anything to do with the card
> > > operating in infiniband mode vs ethernet mode? FreeBSD is using the
> > > modules compiled in connected mode with shell script (which is really
> > > a bash script not a sh script) from freebsd-infiniband page.
> >
> > Nothing to do with Ethernet...
> >
> > Google turned up a brief explanation here:
> >
> > https://wiki.archlinux.org/index.php/InfiniBand
> >
> I still don't get why I would want to use one of the the other or why
> the option is there but it doesn't matter.
> After firmware upgrade and moving to FreeBSD stable (unsure which is
> triggering this) i can no longer
> set connected mode on linux. There are a lot of posts that say you
> have to diabled enhanced iboip mode
> via a modules.conf setting but the driver doesn't have any idea what
> that is. echoing connnected to mode file
> throws a write error. I poked around in linux source but like i'm not
> even level 1 fighter on C. i'm like generic NPC
> that says hi at the gates.
>
> > Those are my module building scripts on the wiki.  What bash extensions
> > did you see?
>
> Isn't this a bash..ism? When i run it inside sh it throws a fit. No
> worries, i just edited loaded.conf
>
> auto-append-line
>
> > >
> > > Linux box complains if mtu is over 2044 with expect mulitcast drops or
> > > something like that so mtu on both boxes is set to 2044.
> > >
> > > Everything i'm reading makes it sound like there is no RDMA support in
> > > FreeBSD or maybe that was no NFS RDMA support. Is that correct?
> > RDMA is inherent in Infiniband AFAIK.  Last I checked, there was no
> > support in FreeBSD for NFS over RDMA, but news travels slowly in this
> > group so a little digging might prove otherwise.
> > >
> > > So far it seems like these cards struggle to full 10 gig pipe. Using
> > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces
> > > aren't showing drops on either end. Doesn't seem to matter if i do 1,
> > > 2 or 4 threads on iperf.
> > You'll need both ends in connected mode with a fairly large MTU to get
> > good throughput.  CentOS defaults to 64k, but FreeBSD is unstable at
> > that size last I checked.  I got good results with 16k.
> >
> > My FreeBSD ZFS NFS server performed comparably to the CentOS servers,
> > with some buffer space errors causing the interface to shut down (under
> > the same loads that caused CentOS servers to lock up completely).
> > Someone mentioned that this buffer space bug has been fixed, but I no
> > longer have a way to test it.
> >
> > Best,
> >
> >      Jason
> >
> > --
> > Earth is a beta site.
>
> So .. i ended up switch to ETHERNET mode via mlxconfig -d PCID set
> LINK_TYPE_P1=2 LINK_TYPE_P2=2
> Oh i also set MTU to 9000.
>
> After that.. the flood gates opened massively.
>
> root at R720-Storage:~ # iperf -c 10.255.255.55 -P4
> ------------------------------------------------------------
> Client connecting to 10.255.255.55, TCP port 5001
> TCP window size: 1.01 MByte (default)
> ------------------------------------------------------------
> [  6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001
> [  3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001
> [  4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001
> [  5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  6]  0.0-10.0 sec  24.6 GBytes  21.1 Gbits/sec
> [  3]  0.0-10.0 sec  23.8 GBytes  20.5 Gbits/sec
> [  4]  0.0-10.0 sec  33.4 GBytes  28.7 Gbits/sec
> [  5]  0.0-10.0 sec  32.9 GBytes  28.3 Gbits/sec
> [SUM]  0.0-10.0 sec   115 GBytes  98.5 Gbits/sec
> root at R720-Storage:~ #
> 11:56 AM
> root at compute720:~# iperf -c 10.255.255.22 -P4
> ------------------------------------------------------------
> Client connecting to 10.255.255.22, TCP port 5001
> TCP window size:  325 KByte (default)
> ------------------------------------------------------------
> [  5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001
> [  3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001
> [  6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001
> [  4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  5]  0.0-10.0 sec  27.4 GBytes  23.5 Gbits/sec
> [  3]  0.0-10.0 sec  26.2 GBytes  22.5 Gbits/sec
> [  6]  0.0-10.0 sec  26.8 GBytes  23.1 Gbits/sec
> [  4]  0.0-10.0 sec  26.0 GBytes  22.3 Gbits/sec
> [SUM]  0.0-10.0 sec   106 GBytes  91.4 Gbits/sec
> root at compute720:~#
>
> I should point out before doing this while running in IB mode with
> datagram mode i disabled SMT and set the power profile to performance
> on box boxes. This moved me up to 10-12 gig/sec, nothing like the
> change to ethernet which i can now fill the pipe from the looks of it.
>
> Also note a single connection doesn't do more then 25ishgig/sec.
>
> Back to SATA being the bottle neck but at least if its coming out of
> the cache there should be more then enough network IO.
>
> Oh one last thing, i thought i read somewhere that you needed to have
> a switch to do ethernet mode. This doesn't seem to be the case. I
> haven't shutdown opensm yet but i'll try that later as i'm assuming i
> no longer need that.
>
> w00t!


More information about the freebsd-infiniband mailing list