NFS + Infiniband problem
rmacklem at uoguelph.ca
Mon Oct 29 15:25:10 UTC 2018
Rodney W. Grimes wrote:
Andrew Vylegzhanin wrote:
>> Hello everyone,
>> I have a several FreeBSD machines connected via Infiniband netwok ( FDR
>> switch Mellanox SW3036 + ConnectX-3 VPI cards ).
>> One of them is a NAS-server with multiply ZFS pools.
>> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on
>> server) are with infiniband connected mode (option IPOIB_CM, option SDM)
>> and world with OFED stack support. (WITH_OFED='yes').
>> File transfers via FTP or SSH between server and clients works almost
>> flawless ( ~ 12 Gbit/s ).
>> But when I try to copy in/out some significant data via NFS share mounted
>> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s)
>> transfer speed after uncertain amount of copied data. For example, on the
>> one node I can copy 1GB file, and after NFS hang on file with size 30 kb.
>> Some details:
>> [root at node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt
>I am not sure what the interaction between page sizes, TSO needs,
>buffer needs and all that are but I always use a power of 2 wsize
They should always be a power of 2. I think the code clips the value, but it might
only clip to a multiple of 512. If it didn't clip this down to 16384, then that
would definitely be a problem. Also, normally the same size for rsize and wsize
is used. If you don't do that, you end up with weird sided blocks in the buffer cache.
I think it still works when this is done, but could cause performance hits.
Probably doesn't matter for a simple performance test.
(You can find out what options it is actually using by typing "nfsstat -m" after doing the mount.)
> You might try that. And as Rick suggested, turn of
>TSO, if you can. Is infiniband using RDMA to do this, if so then
>the page size stuff is probably very important, use multiples of
RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA
on a separate connection with NFSv4.1 or later, but I've never written code
for that. (Not practical to try to implement without access to hardware that
[performance stuff snipped]
More information about the freebsd-infiniband