Re: speeding up zfs send | recv (update)

From: Miroslav Lachman <000.fbsd_at_quip.cz>
Date: Wed, 22 Feb 2023 18:06:06 UTC
On 17/05/2021 17:57, mike tancsa wrote:

[...]

> On the mail spool just via mbuffer (no ssh involved at all)
> 
> zfs send
> summary:  514 GiByte in 1h 09min 35.9sec - average of  126 MiB/s
> zfs send -c
> summary:  418 GiByte in 1h 05min 58.5sec - average of  108 MiB/s
> 
> and the same dataset, sending just through OpenSSH took 1h:06m (zfs
> send) and 1h:01m (zfs send -c)
> 
> 
> On the large dataset (large VMDK files), similar pattern. I did find one
> interesting thing, when I was testing with a smaller dataset of just
> 12G.  As the server has 65G of ram, 29 allocated to ARC, sending a zfs
> stream with -c made a giant difference. I guess there is some efficiency
> with sending something thats already compressed in arc ? Or maybe its
> just all cache effect.
> 
> Testing with one with about 1TB of referenced data using mbuffer with
> and without ssh  and just ssh
> 
> zfs send with mbuffer and ssh
> summary:  772 GiByte in 51min 06.2sec - average of  258 MiB/s
> zfs send -c
> summary:  772 GiByte in 1h 22min 09.3sec - average of  160 MiB/s
> 
> And the same dataset just with ssh -- zfs send 53min and zfs send -c 55min
> 
> and just mbuffer (no ssh)
> 
> summary:  772 GiByte in 56min 45.7sec - average of  232 MiB/s (zfs send -c)
> summary: 1224 GiByte in 53min 20.4sec - average of  392 MiB/s (zfs send)

I am facing similar problem with low performance of zfs send over 
network. I have 2 machines in two different datacenters, both have 1Gbps 
NIC and I would like to saturate the network but it seems impossible 
even if "everything" seem to have enough unused resources.
The sending side is very old HP ML110 G5 with bge0 NIC, receiving side 
is VM with enough CPU, RAM, 22TB storage and vtnet0 NIC.
Sender has about 25% CPU idle during sending, disks are not saturated 
according to iostat -w -x but I still cannot see more than 52MiB/s. 
Everything about zfs snapshot, send, receive etc. is handled by syncoid 
from sanoid package (ssh + mbuffer + pv, no lzop)

I thought it is slow because of ssh ciphers so I tried to change with 
--sshciper but it was 10MiB/s slower when I changed from default 
chacha20-poly1305@openssh.com to aes128-ctr

The interesting thing was testing with scp with different ciphers 
(tested all available) and sending 1GB file full of random bytes from dd 
if=/dev/urandom. With each scp cipher it was able to achieve 65MiB/s.

Even more interesting is that "zfs send -c" is slower than "zfs send" 
(without compressed stream). I would expect the opposite.

Why is zfs send -c slower?

Some datasets are transferred with much lower speed about 40MiB/s.

What are the limiting factors for zfs send if there is 25% idle CPU, 
RAIDZ of 4x SATA disks should give more than 50MB/s (iostat show disks 
busy oscillating between 50% - 80% and receiving side is under 50% usage 
(CPU and disk)

Sender stats looks like this:

# iostat -x -w 30 ada{0,1,2,3}

                         extended device statistics
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
ada0         227      18  10961.1    260.5     2     2    56     2    0  17
ada1         276      18  12996.9    260.8     2     2    62     2    0  18
ada2         225      17  10567.9    252.1     3     2    68     3    0  19
ada3         275      17  12217.1    252.4     2     2    77     2    0  19
                         extended device statistics
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
ada0         185      11   6623.3    126.5     5     4   110     5    5  24
ada1         204      11   7238.8    124.0     3     4   109     4    3  24
ada2         187      11   6474.7    119.5     5    10   106     5    5  26
ada3         205      10   6825.5    115.7     4     8   108     4    2  26
                         extended device statistics
device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
ada0         239       3   7698.6     59.3     5     1    46     4    0  29
ada1         241       3   7507.2     59.3     4     1    61     4    1  29
ada2         240       3   7476.6     59.3     5     1    40     5    0  30
ada3         235       3   7156.9     59.3     4     2    75     4    0  30


# ifstat -i bge0 60
        bge0
  KB/s in  KB/s out
  1245.04  49964.32
  1321.39  53035.46
  1323.03  53108.41
  1319.83  52947.85
  1248.55  50089.84
  1240.00  49706.52
  1278.22  51309.49
  1274.60  51159.64
  1279.98  51366.20
  1157.62  46435.98


Network latency between machines
round-trip min/avg/max/stddev = 0.326/0.492/0.897/0.156 ms


As I need to transfer about 15TB of data I really would like to speed it 
up to the maximum. In the current state it would take a week to transfer it.

Kind regards
Miroslav Lachman