Data corruption via IPoIB in connected mode between Linux and FreeBSD
Hans Petter Selasky
hps at selasky.org
Wed Mar 24 07:53:49 UTC 2021
On 3/24/21 5:57 AM, hiroshi matsuo wrote:
> I'm trying IPoIB between Linux and FreeBSD with Mellanox ConnectX-2 cards.
> Now I have a strange problem above my knowledge.
> IP address=126.96.36.199/24 (attached to ipoib device),
> transport mode=connected
> MTU=65520 (following RedHat document)
> IP address=188.8.131.52/24 (attached to ipoib device),
> transport mode=connected (I built with IPOIB_CM options)
> MTU=4092 (default? I want this set 65520 to be the same as CentOS, but I
> can not.
> mlx4_core0: 65520
> is invalid IBTA mtu
> dmesg shows. Why?)
Did you apply my patch to FreeBSD 12.2 for connected mode?
Try to subtract the size of the infiniband address, 20 bytes.
Try setting the MTU to 9000 instead. Using such a large MTU with FreeBSD
doesn't make sense.
Are you using an infiniband router between? If yes, has this been
configured to handle this big MTU?
> FreeBSD box has a 2TB ZFS pool and there are about 4,000,000 files in it.
> A few days ago I copied all files from FreeBSD to CentOS by rsync
> like this:
> centos$ rsync -av -e ssh matsuo at 10.0.1.1:/tank/data/ ~/data
> At one time I found a corrupted file accidentally, however rsync finished
> with no error message.
> I have looked into all files and compared between copies and originals. At
> I understand that:
> 1. There are 24 corrupted files (MD5 value is different from original)
> i.e. 0.0006% failure, 99.9994% success
> 2. Every corrupted file has just one byte which is different from original
> and the position of the error byte seems random. So not a burst error.
> I doubt whether CM is established but I don't know the way to inspect it
If you have IPOIB_CM set, you should be good.
> Please point out to me
> what is the root cause
> what is wrong about my setup
> document worth reading first
> and so on.
Like said, there is a bug in IPoIB CM mode, that is not fixed unless you
apply the patch I sent, which Mellanox will upstream later.
> In addition I had iperf tests.
> 16Gbps (in CentOS-CentOS case)
> 4Gbps (in CentOS-FreeBSD case)
> So I think My FreeBSD server does not work properly and something wrong.
More information about the freebsd-infiniband