zfs send/recv dies when transferring large-ish dataset

Thu Jun 13 07:57:47 UTC 2013

Hi,

Can you try send/recv with the -v or with -vP swiches, so you can see more verbose information?

Regards,
Ivailo Tanusheff

-----Original Message-----
From: owner-freebsd-fs at freebsd.org [mailto:owner-freebsd-fs at freebsd.org] On Behalf Of Jona Schuman
Sent: Thursday, June 13, 2013 2:41 AM
To: freebsd-fs at freebsd.org
Subject: zfs send/recv dies when transferring large-ish dataset

Hi,

I'm getting some strange behavior from zfs send/recv and I'm hoping someone may be able to provide some insight. I have two identical machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool
28) for storage. I want to use zfs send/recv for replication between the two machines. For the most part, this has worked as expected.
However, send/recv fails when transferring the largest dataset (both in actual size and in terms of number of files) on either machine.
With these datasets, issuing:

machine2# nc -d -l 9999 | zfs recv -d storagepool machine1# zfs send dataset at snap | nc machine2 9999

terminates early on the sending side without any error messages. The receiving end continues on as expected, cleaning up the partial data received so far and reverting to its initial state. (I've tried using mbuffer instead of nc, or just using ssh, both with similar results.) Oddly, zfs send dies slightly differently depending on how the two machines are connected. When connected through the racktop switch, zfs send dies quietly without any indication that the transfer has failed.
When connected directly using a crossover cable, zfs send dies quietly and machine1 becomes unresponsive (no network, no keyboard, hard reset required). In both cases, no messages are printed to screen or to anything in /var/log/.

I can transfer the same datasets successfully if I send/recv to/from file:

machine1# zfs send dataset at snap > /tmp/dump machine1# scp /tmp/dump machine2:/tmp/dump machine2# zfs recv -d storagepool < /tmp/dump

so I don't think the datasets themselves are the issue. I've also successfully tried send/recv over the network using different network interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which would suggest the issue is with the 1GbE links.

Might there be some buffering parameter that I'm neglecting to tune, which is essential on the 1GbE links but may be less important on the faster links? Are there any known issues with the igb driver that might be the culprit here? Any other suggestions?

Thanks,
Jona
_______________________________________________
freebsd-fs at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"