Questions about Infiniband on FreeBSD

Jason Bacon bacon4000 at gmail.com
Thu Oct 3 01:33:45 UTC 2019


A subnet manager is required for IPOIB.  It's often run on the switch, 
but since you don't have one...
>
> That's my question -- is that requirement coming from the hardware (or 
> firmware inside it)? What does the manager actually /do/ -- and, 
> whatever it is, does it really need doing constantly in a simple setup 
> like mine, or can opensm come up once (at boot time), do it, and then 
> go away?
>
What's the advantage of not running opensm?  It's just a small daemon on 
one server on the fabric.
>
>>> 2. Although pings were working and NFS would mount, data-transfers
>>>    weren't reliable until I /manually/ lowered the MTU -- on both ends
>>>    -- to 2044 (from the 65520 used by the ib-interfaces by default).
>>>    And it only occurred to me to do that, when I saw a kernel's message
>>>    on one of the two consoles complaining about a packet length of 16k
>>>    being greater than 2044... If that's a known limit, why is not the
>>>    MTU set to it by default?
>> I saw frequent hangs (self-resolving) with an MTU of 65520. Cutting 
>> it in half improved reliability by orders of magnitude, but still 
>> occasional issues.  Halving it again to 16380 seemed to be the sweet 
>> spot.
>
> Most interesting -- I thought, 2044 was the hardware limit of some 
> sort... Is not it a bug, that much larger values are allowed, but do 
> not work? I just raised it to 16380 here and things seems to continue 
> working (did "cvs update" of the entire pkgsrc-repo over NFS). But the 
> kernel said:
>
>     ib1: mtu > 2044 will cause multicast packet drops.
>
> I probably don't care for multicast, as long as NFS works...
>
2044 is a hard limit for datagram mode and suitable for MPI traffic, 
which usually consists of many small messages.  Connected mode allows 
much higher MTUs as you would want for many IP services.
>
>>> 3. Currently, I have only one cable connecting the ib1 on one machine
>>>    to ib1 of another. Would I get double the throughput if I connect
>>>    the two other ports together as well and bundle the connections? If
>>>    yes, should I bundle them as network-interfaces -- using lagg(4) --
>>>    or is there something Infiniband-specific?
>> Good question.  With Mellanox 6036 switches, nothing needs to be 
>> configured to benefit from multiple links.  We ran 6 from each of two 
>> top-level switches to each of 6 leaf switches.  The switches 
>> recognize the fabric topology automatically.  I don't know if the 
>> same is true with the HCAs.  You could try just adding a cable and 
>> compare results from iperf, etc.
>
> Sorry, I don't understand, how that would "just work" -- if both 
> interfaces (ib1 and ib0) are configured separately, with different 
> IP-addresses, etc?
>
Good point, I hadn't given it enough thought before responding.

-- 
Earth is a beta site.




More information about the freebsd-infiniband mailing list