Data corruption via IPoIB in connected mode between Linux and FreeBSD

hiroshi matsuo matsuo.hiroshi.39 at gmail.com
Wed Mar 24 04:58:25 UTC 2021


Dear,

I'm trying IPoIB between Linux and FreeBSD with Mellanox ConnectX-2 cards.
Now I have a strange problem above my knowledge.

CentOS-7:
IP address=1.0.1.2/24 (attached to ipoib device),
transport mode=connected
MTU=65520 (following RedHat document)

FreeBSD-12.2:
IP address=1.0.1.1/24 (attached to ipoib device),
transport mode=connected (I built with IPOIB_CM options)
MTU=4092  (default?  I want this set 65520 to be the same as CentOS, but I
can not.
 mlx4_core0: 65520
  is invalid IBTA mtu
dmesg shows. Why?)

FreeBSD box has a 2TB ZFS pool and there are about 4,000,000 files in it.
A few days ago I copied all files from FreeBSD to CentOS by rsync
like this:
  centos$ rsync -av -e ssh matsuo at 10.0.1.1:/tank/data/  ~/data

At one time I found a corrupted file accidentally, however rsync finished
with no error message.
I have looked into all files and compared between copies and originals.  At
last
I understand that:
  1. There are 24 corrupted files (MD5 value is different from original)
         i.e.  0.0006% failure, 99.9994%  success
  2. Every corrupted file has just one byte which is different from original
     and the position of the error byte seems random. So not a burst error.

I doubt  whether CM is established but I don't know the way to inspect it
deeply.

Please point out to me
 what is the root cause
 what is wrong about my setup
 document worth reading first
and so on.

In addition I had iperf tests.
   16Gbps   (in CentOS-CentOS case)
    4Gbps   (in CentOS-FreeBSD case)
So I think My FreeBSD server does not work properly and something wrong.

Thank you.
Hiroshi Matsuo


More information about the freebsd-infiniband mailing list