NFS + Infiniband problem

Rick Macklem rmacklem at uoguelph.ca
Mon Oct 29 15:25:10 UTC 2018


Rodney W. Grimes wrote:
Andrew Vylegzhanin wrote:
>> Hello everyone,
>>
>> I have a several FreeBSD machines connected via Infiniband netwok ( FDR
>> switch Mellanox SW3036 + ConnectX-3 VPI cards ).
>> One of them is a NAS-server with multiply ZFS pools.
>>
>> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on
>> server) are with infiniband connected mode (option IPOIB_CM, option SDM)
>> and world with OFED stack support. (WITH_OFED='yes').
>>
>> File transfers via FTP or SSH between server and clients works almost
>> flawless ( ~ 12 Gbit/s ).
>>
>> But when I try to copy in/out some significant data via NFS share mounted
>> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s)
>> transfer speed after uncertain amount of copied data. For example, on the
>> one node I can copy 1GB file, and after NFS hang on file with size 30 kb.
>>
>> Some details:
>> [root at node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt
>                               ^^^^^^^^^^^^
>I am not sure what the interaction between page sizes, TSO needs,
>buffer needs and all that are but I always use a power of 2 wsize
>and rsize.
They should always be a power of 2. I think the code clips the value, but it might
only clip to a multiple of 512. If it didn't clip this down to 16384, then that
would definitely be a problem. Also, normally the same size for rsize and wsize
is used. If you don't do that, you end up with weird sided blocks in the buffer cache.
I think it still works when this is done, but could cause performance hits.
Probably doesn't matter for a simple performance test.
(You can find out what options it is actually using by typing "nfsstat -m" after doing  the mount.)

>   You might try that.  And as Rick suggested, turn of
>TSO, if you can.  Is infiniband using RDMA to do this, if so then
>the page size stuff is probably very important, use multiples of
>4096 only.
RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA
on a separate connection with NFSv4.1 or later, but I've never written code
for that. (Not practical to try to implement without access to hardware that
does it.)

rick
[performance stuff snipped]


More information about the freebsd-infiniband mailing list