Re: ZFS on high-latency devices

From: George Michaelson <ggm_at_algebras.org>
Date: Sun, 22 Aug 2021 23:54:39 UTC
I don't think its sensible to mesh long-delay file constructs into a
pool. Maybe there is a model which permits this but I think higher
abstractions like CEPH may be a better fit for the kind of distributed
filestore this implies.

I use ZFS send/receive to make "clones" of a zpool/zfs structure
exist, and they are not bound in the delay problem for live FS
delivery: you use mbuffer to make the transport of the entire ZFS
datastate more effective.

A long time ago, 25+ years ago I ran NFS over X25, as well as SMB. (ip
over X25) and it was very painful. Long-haul, long delay networks are
not a good fit for direct FS semantics of read/write if you care about
speed. There isn't enough cacheing in the world to make the distance
fully transparent. When I have to use FS-over-<thing> now it's
typically to mount virtual ISO images for console-FS recovery of a
host, and its bad enough over 150ms delay fibre links not to want to
depend on it beyond the bootstrap phase.

-G

On Mon, Aug 23, 2021 at 9:48 AM Alan Somers <asomers@freebsd.org> wrote:
>
> mbuffer is not going to help the OP.  He's trying to create a pool on top of a networked block device.  And if I understand correctly, he's connecting over a WAN, not a LAN.  ZFS will never achieve decent performance in such a setup.  It's designed as a local file system, and assumes it can quickly read metadata off of the disks at any time.  The OP's best option is to go with "a": encrypt each dataset and send them with "zfs send --raw".  I don't know why he thinks that it would be "very difficult".  It's quite easy, if he doesn't care about old snapshots.  Just:
>
> $ zfs create <crypto options> pool/new_dataset
> $ cp -a pool/old_dataset/* pool/new_dataset/
>
> -Alan
>
> On Sun, Aug 22, 2021 at 5:40 PM George Michaelson <ggm@algebras.org> wrote:
>>
>> I don't want to abuse the subject line too much, but I can highly
>> recommend the mbuffer approach, I've used this repeatedly, BSD-BSD and
>> BSD-Linux. It definitely feels faster than SSH, since the 'no cipher'
>> options were removed, and in the confusion of the HPC buffer changes.
>> But, its not crypted on-the-wire.
>>
>> Mbuffer tuning is a bit of a black art: it would help enormously if
>> there was some guidance on this, and personally I've never found the
>> mbuffer -v option to work well: I get no real sense of how full or
>> empty the buffer "is" or, if the use of sendmsg/recvmsg type buffer
>> chains is better or worse.
>>
>> -G
>>
>> On Fri, Aug 20, 2021 at 6:19 PM Ben RUBSON <ben.rubson@gmx.com> wrote:
>> >
>> > > On 19 Aug 2021, at 11:37, Peter Jeremy <peter@rulingia.com> wrote:
>> > >
>> > > (...) or a way to improve throughput doing "zfs recv" to a pool with a high RTT.
>> >
>> > You should use zfs send/receive through mbuffer, which will allow to sustain better throughput over high latency links.
>> > Feel free to play with its buffer size parameters to find the better settings, depending on your link characteristics.
>> >
>> > Ben
>> >
>> >
>>