NFS + Infiniband problem

Rodney W. Grimes freebsd-rwg at pdx.rh.CN85.dnsmgr.net
Mon Oct 29 15:48:50 UTC 2018


> Rodney W. Grimes wrote:
> Andrew Vylegzhanin wrote:
> >> Hello everyone,
> >>
> >> I have a several FreeBSD machines connected via Infiniband netwok ( FDR
> >> switch Mellanox SW3036 + ConnectX-3 VPI cards ).
> >> One of them is a NAS-server with multiply ZFS pools.
> >>
> >> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on
> >> server) are with infiniband connected mode (option IPOIB_CM, option SDM)
> >> and world with OFED stack support. (WITH_OFED='yes').
> >>
> >> File transfers via FTP or SSH between server and clients works almost
> >> flawless ( ~ 12 Gbit/s ).
> >>
> >> But when I try to copy in/out some significant data via NFS share mounted
> >> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s)
> >> transfer speed after uncertain amount of copied data. For example, on the
> >> one node I can copy 1GB file, and after NFS hang on file with size 30 kb.
> >>
> >> Some details:
> >> [root at node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt
> >                               ^^^^^^^^^^^^
> >I am not sure what the interaction between page sizes, TSO needs,
> >buffer needs and all that are but I always use a power of 2 wsize
> >and rsize.
> They should always be a power of 2. I think the code clips the value, but it might
> only clip to a multiple of 512. If it didn't clip this down to 16384, then that
> would definitely be a problem. Also, normally the same size for rsize and wsize
> is used. If you don't do that, you end up with weird sided blocks in the buffer cache.
> I think it still works when this is done, but could cause performance hits.
> Probably doesn't matter for a simple performance test.
> (You can find out what options it is actually using by typing "nfsstat -m" after doing  the mount.)
> 
> >   You might try that.  And as Rick suggested, turn of
> >TSO, if you can.  Is infiniband using RDMA to do this, if so then
> >the page size stuff is probably very important, use multiples of
> >4096 only.
> RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA
> on a separate connection with NFSv4.1 or later, but I've never written code
> for that. (Not practical to try to implement without access to hardware that
> does it.)

It would be very easy to arrange for a pair of PCIE 10G IB cards and
a cable to go betweem them if that would be of use to you in some day
doing some of this work, or for that mater for even playing with NFS
over IB.

> rick

-- 
Rod Grimes                                                 rgrimes at freebsd.org


More information about the freebsd-infiniband mailing list