Questions about Infiniband on FreeBSD
Jason Bacon
bacon4000 at gmail.com
Thu Oct 3 00:49:20 UTC 2019
On 2019-10-02 18:58, Mikhail T. wrote:
> Hello! After some wrangling, I got the direct (no switch) Infiniband
> connection working reliably between my two servers (a dual port mlx4
> card in each). I have the following questions:
>
> 1. Why is running opensm mandatory even in a "point-to-point" setup
> like mine? I would've thought, whatever the two ends need to tell
> each other could be told /once/, after which the connection will
> continue to work even if the opensm-process goes away.
> Unfortunately, shutting down opensm freezes the connection... Is
> that a hardware/firmware requirement, or can this be improved?
A subnet manager is required for IPOIB. It's often run on the switch,
but since you don't have one...
> 2. Although pings were working and NFS would mount, data-transfers
> weren't reliable until I /manually/ lowered the MTU -- on both ends
> -- to 2044 (from the 65520 used by the ib-interfaces by default).
> And it only occurred to me to do that, when I saw a kernel's message
> on one of the two consoles complaining about a packet length of 16k
> being greater than 2044... If that's a known limit, why is not the
> MTU set to it by default?
I saw frequent hangs (self-resolving) with an MTU of 65520. Cutting it
in half improved reliability by orders of magnitude, but still
occasional issues. Halving it again to 16380 seemed to be the sweet spot.
> 3. Currently, I have only one cable connecting the ib1 on one machine
> to ib1 of another. Would I get double the throughput if I connect
> the two other ports together as well and bundle the connections? If
> yes, should I bundle them as network-interfaces -- using lagg(4) --
> or is there something Infiniband-specific?
Good question. With Mellanox 6036 switches, nothing needs to be
configured to benefit from multiple links. We ran 6 from each of two
top-level switches to each of 6 leaf switches. The switches recognize
the fabric topology automatically. I don't know if the same is true
with the HCAs. You could try just adding a cable and compare results
from iperf, etc.
> 4. Mellanox recommends keeping the cards' firmware up-to-date. Does
> FreeBSD have a tool to do that?
I'd also like to know.
Regards,
JB
--
Earth is a beta site.
More information about the freebsd-infiniband
mailing list