zfs send/recv dies when transferring large-ish dataset

Jona Schuman jonaschuman at gmail.com
Wed Jun 12 23:40:51 UTC 2013


Hi,

I'm getting some strange behavior from zfs send/recv and I'm hoping
someone may be able to provide some insight. I have two identical
machines running 9.0-RELEASE-p3, each having a ZFS pool (zfs 5, zpool
28) for storage. I want to use zfs send/recv for replication between
the two machines. For the most part, this has worked as expected.
However, send/recv fails when transferring the largest dataset (both
in actual size and in terms of number of files) on either machine.
With these datasets, issuing:

machine2# nc -d -l 9999 | zfs recv -d storagepool
machine1# zfs send dataset at snap | nc machine2 9999

terminates early on the sending side without any error messages. The
receiving end continues on as expected, cleaning up the partial data
received so far and reverting to its initial state. (I've tried using
mbuffer instead of nc, or just using ssh, both with similar results.)
Oddly, zfs send dies slightly differently depending on how the two
machines are connected. When connected through the racktop switch, zfs
send dies quietly without any indication that the transfer has failed.
When connected directly using a crossover cable, zfs send dies quietly
and machine1 becomes unresponsive (no network, no keyboard, hard reset
required). In both cases, no messages are printed to screen or to
anything in /var/log/.


I can transfer the same datasets successfully if I send/recv to/from file:

machine1# zfs send dataset at snap > /tmp/dump
machine1# scp /tmp/dump machine2:/tmp/dump
machine2# zfs recv -d storagepool < /tmp/dump

so I don't think the datasets themselves are the issue. I've also
successfully tried send/recv over the network using different network
interfaces (10GbE ixgbe cards instead of the 1GbE igb links), which
would suggest the issue is with the 1GbE links.

Might there be some buffering parameter that I'm neglecting to tune,
which is essential on the 1GbE links but may be less important on the
faster links? Are there any known issues with the igb driver that
might be the culprit here? Any other suggestions?

Thanks,
Jona


More information about the freebsd-fs mailing list