NFS + Infiniband problem

Andrew Vylegzhanin avv314 at gmail.com
Tue Oct 30 04:57:35 UTC 2018


> >> Some details:
> >> [root at node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2
/mnt
> >                               ^^^^^^^^^^^^
> >I am not sure what the interaction between page sizes, TSO needs,
> >buffer needs and all that are but I always use a power of 2 wsize
> >and rsize.

Again after some tests.
I've tried 4096,8192,16384 wsize/rsize. Only 4096 value give some
measurable result and it's extremely slow ~ 10-16MB/s for writing (depend
on number of threads), 10-12 MB/s for reading. With other values NFS hangs
(or almost hangs - couple kB/s in average)

Changing sysctl net.inet.tcp.tso=0 (w/o reboot) on both sides  had no
effect.

AFAIK, infiniband interface has no option for TSO,LRO:
ib0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 65520
         options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
         lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.50.df.51
         inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255
         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>


BTW, hers is my sysctl.conf file with optimisation for congestion control
and tcp buffers on 10/40 Gbit/s links (the server had 40 Gbit/s Intel ixl
ethernet also):

kern.ipc.maxsockbuf=16777216

net.inet.tcp.sendbuf_max=16777216

net.inet.tcp.recvbuf_max=16777216

net.inet.tcp.sendbuf_auto=1

net.inet.tcp.recvbuf_auto=1

net.inet.tcp.sendbuf_inc=16384

net.inet.tcp.recvbuf_inc=524288

net.inet.tcp.cc.algorithm=htcp


> >   You might try that.  And as Rick suggested, turn of
> >TSO, if you can.  Is infiniband using RDMA to do this, if so then
> >the page size stuff is probably very important, use multiples of
> >4096 only.
> RDMA is not supported by the FreeBSD NFS client. There is a way to use
RDMA
> on a separate connection with NFSv4.1 or later, but I've never written
code
> for that. (Not practical to try to implement without access to hardware
that
> does it.)
>

Hope I could help with testing.

--
Andrew

> rick
> [performance stuff snipped]


More information about the freebsd-infiniband mailing list