NFS + Infiniband problem

Wed Oct 31 02:53:51 UTC 2018

Andrew Vylegzhanin wrote:
>> >> Some details:
>> >> [root at node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt
>> >                               ^^^^^^^^^^^^
>
>
>Again after some tests.
>I've tried 4096,8192,16384 wsize/rsize. Only 4096 value give some measurable >result and it's extremely slow ~ 10-16MB/s for writing (depend on number of >threads), 10-12 MB/s for reading. With other values NFS hangs (or almost hangs - >couple kB/s in average)
>
>Changing sysctl net.inet.tcp.tso=0 (w/o reboot) on both sides  had no effect.
>
>AFAIK, infiniband interface has no option for TSO,LRO:
>ib0: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 65520
>         options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
>         lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.50.df.51
>         inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255
>         nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>
>
>BTW, hers is my sysctl.conf file with optimisation for congestion control and tcp >buffers on 10/40 Gbit/s links (the server had 40 Gbit/s Intel ixl ethernet also):
>
>kern.ipc.maxsockbuf=16777216
>
>net.inet.tcp.sendbuf_max=16777216
>
>net.inet.tcp.recvbuf_max=16777216
>
>net.inet.tcp.sendbuf_auto=1
>
>net.inet.tcp.recvbuf_auto=1
>
>net.inet.tcp.sendbuf_inc=16384
>
>net.inet.tcp.recvbuf_inc=524288
>
>net.inet.tcp.cc.algorithm=htcp
Well, I'm not familiar with the current TCP stack (and, as noted before, I know
nothing about InfiniBand). All I can suggest is testing with the default congestion
control algorithm. (I always test with the default, which appears to be newreno.)
NFS traffic looks very different than a typical use of TCP. Lots of small TCP
segments in both directions interspersed with some larger ones (the write
requests or read replies).

rick