FBSD to FBSD NFS Mounts over IB.

Janky Jay, III jankyj at unfs.us
Thu May 15 17:30:24 UTC 2014


Hello Karl,

On 05/15/2014 10:50 AM, Karl Pielorz wrote:
>
>
> --On 15 May 2014 08:50:25 -0600 "Janky Jay, III" <jankyj at unfs.us> wrote:
>
>>     I have set up one of the FBSD systems to run OpenSM and also be an
>> NFS
>> server which all the systems seem to be able to mount over the IB devices
>> without any issue at all. Small reads and writes to and from the NFS
>> server to all the other nodes also seems to work without any issue.
>> However, if I try to dump large amounts of data using "dd" (in order to
>> test speeds and stability), the FBSD NFS client craps out immediately. I
>> just get the following message(s) over and over:
>>
>> newnfs server 10.11.1.1:/data: not responding
>> newnfs server 10.11.1.1:/data: not responding
>
> Can both sides 'ping' each other when this happens?
>

	I just tested this while node2 was hanging with another NFS transfer 
(just a "cp /home/file /data/file") and both nodes (1 and 2) can ping 
each other without any issues.

> The reason I ask is I've hit a similar issue setting up ZFS over iSCSI
> on IPOIB (I'm not running connected mode).
>
> At my end it looks like an ARP expires or something so the sides 'lose
> site' of each other. If 'A' can't see 'B' - a ping from 'B' to 'A'
> usually restores the connection.
>

	I've actually seen this a lot with OpenVPN as well. For some reason, 
the ARP does seem to expire or something and I can no longer reach other 
systems on the LAN. And, like you, simply pinging the IP/hostname 
resolves the issue and I am able to connect to services on the LAN 
again... Very strange.

> Maybe make sure they can both still see each other outside of nfs - I
> can temporarily 'fix' the issue here by leaving both sides pinging each
> other - I've not really had a chance to look at it much recently...
>

	I'll leave them both pinging each other for a while and see if the 
transfer ends up finishing. Thanks for the info!

	Also, I found a lot of the below in the /var/log/messages log files on 
both FBSD systems over and over but they seem very random (time-wise):

May 15 03:14:26 node1 kernel: "received MAD: slid:4 sqpn:1 " 
"dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 15\n"
May 15 03:17:43 node1 kernel: "received MAD: slid:3 sqpn:1 " 
"dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 15\n"
May 15 03:17:50 node1 kernel: ib0: timing out; 2 sends not completed
May 15 03:18:22 node1 kernel: "received MAD: slid:4 sqpn:1 " 
"dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 10\n"<7>"received 
MAD: slid:4 sqpn:1 " "dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 
14\n"
May 15 03:19:26 node1 kernel: "received MAD: slid:4 sqpn:1 " 
"dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 15\n"
May 15 03:26:10 node1 kernel: "received MAD: slid:4 sqpn:1 " 
"dlid_bits:0 dqpn:1 wc_flags:0x0, cls 7, mtd 3, atr 15\n"

Regards,
Janky Jay, III




More information about the freebsd-infiniband mailing list