Questions about Infiniband on FreeBSD

Jason Bacon bacon4000 at gmail.com
Thu Oct 3 00:49:20 UTC 2019


On 2019-10-02 18:58, Mikhail T. wrote:
> Hello! After some wrangling, I got the direct (no switch) Infiniband 
> connection working reliably between my two servers (a dual port mlx4 
> card in each). I have the following questions:
>
> 1. Why is running opensm mandatory even in a "point-to-point" setup
>    like mine? I would've thought, whatever the two ends need to tell
>    each other could be told /once/, after which the connection will
>    continue to work even if the opensm-process goes away.
>    Unfortunately, shutting down opensm freezes the connection... Is
>    that a hardware/firmware requirement, or can this be improved?
A subnet manager is required for IPOIB.  It's often run on the switch, 
but since you don't have one...
> 2. Although pings were working and NFS would mount, data-transfers
>    weren't reliable until I /manually/ lowered the MTU -- on both ends
>    -- to 2044 (from the 65520 used by the ib-interfaces by default).
>    And it only occurred to me to do that, when I saw a kernel's message
>    on one of the two consoles complaining about a packet length of 16k
>    being greater than 2044... If that's a known limit, why is not the
>    MTU set to it by default?
I saw frequent hangs (self-resolving) with an MTU of 65520.  Cutting it 
in half improved reliability by orders of magnitude, but still 
occasional issues.  Halving it again to 16380 seemed to be the sweet spot.

> 3. Currently, I have only one cable connecting the ib1 on one machine
>    to ib1 of another. Would I get double the throughput if I connect
>    the two other ports together as well and bundle the connections? If
>    yes, should I bundle them as network-interfaces -- using lagg(4) --
>    or is there something Infiniband-specific?
Good question.  With Mellanox 6036 switches, nothing needs to be 
configured to benefit from multiple links.  We ran 6 from each of two 
top-level switches to each of 6 leaf switches.  The switches recognize 
the fabric topology automatically.  I don't know if the same is true 
with the HCAs.  You could try just adding a cable and compare results 
from iperf, etc.
> 4. Mellanox recommends keeping the cards' firmware up-to-date. Does
>    FreeBSD have a tool to do that?
I'd also like to know.

Regards,

     JB

-- 
Earth is a beta site.




More information about the freebsd-infiniband mailing list