From nobody Fri Nov 18 00:47:12 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NCymY6cZ3z4hgPJ for ; Fri, 18 Nov 2022 00:47:25 +0000 (UTC) (envelope-from eborisch@alumni.stanford.edu) Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NCymY6RPJz3mDk for ; Fri, 18 Nov 2022 00:47:25 +0000 (UTC) (envelope-from eborisch@alumni.stanford.edu) Authentication-Results: mx1.freebsd.org; none Received: by mail-pl1-x633.google.com with SMTP id k7so3190698pll.6 for ; Thu, 17 Nov 2022 16:47:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4PG3XFMNTqUd7blJbERke3tcR5zqJLmpB1iB+YZ7Hrk=; b=VozB2uqqCSZbw5PHgfNQT2dUCF6kZ37i82YuVfJphBqznf3u4LWllFicbtlH5zFwM3 ubDPb/lT87tSCfxipnY7kxe95ikay1TxOkK4u6BlQn9GCJnWNYjRFa2EdsSapVnQOp6d ++plS0ckvY7rRInN8/LjFn8xR9pkmT6eNC2Z81DTdl6Mi0xl5tHHgCyJy/khBFZZKw6Y fV6vzhpMeW+CMGog51x99GXaPSwkBGRsQltp070Rzd3Hf+rHXXhHnhQOQD7ltaghfOtw BtayR16U6ezw8Uj+6rrZIbItkpRpcIf7HXU2R12MoxTB7NYsOK/KooN2raIDUn8Yafm8 ZgXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4PG3XFMNTqUd7blJbERke3tcR5zqJLmpB1iB+YZ7Hrk=; b=4LlNSR3+x9+IHyZI4DDZ37y2zDaLj98QsAG1oL5jGPNxNegcvT+2TuBkPf2PZwO65g XmN9wvX5nhpFaRlx1eDBkpUQA66pf8GLX6lgbYPjHqhqAMxHEPSOfnk613mwychAE0XT eUvVQl2vw1Pb/MNx9Thv2qPr5uDPNUvjCgcYdjTiQunhHe4R52IhUPj6IuMEnZEn7Z4f 9aywfkht/gtkrtKsYvIWRFR8rEuiu4Du3Gm/vslpabG44xFZD4oAJKW6Do0laqwREurn oQS6wgN28OccO0jd4rgeqnuFEAhFNn1b9VNSszURpdS0iYYqxt3JKMBsnKcDiv1+aVkE VAKg== X-Gm-Message-State: ANoB5plCiZB1HRI5homtWs3OKXWcoQfIQ+s9v5qS0d1ZZngm/GaixJh9 JBDKRvNv3kOqKONLBi9T1LJn57n86t2YKz1hhR9oZAPkNzI= X-Google-Smtp-Source: AA0mqf7myWtO83kW3+xevTG28ZNnde4B+YgPYxey/97cI8m7C5IgLV69RjqdJlC2R7b/A1/f8ZVEcfo/O+z7AQFO+l4= X-Received: by 2002:a17:902:7289:b0:188:f6ba:8521 with SMTP id d9-20020a170902728900b00188f6ba8521mr3106937pll.105.1668732443248; Thu, 17 Nov 2022 16:47:23 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Eric Borisch Date: Thu, 17 Nov 2022 18:47:12 -0600 Message-ID: Subject: Re: Odd behaviour of two identical ZFS servers mirroring via rsync To: andy thomas Cc: Bob Friesenhahn , Freddie Cash , FreeBSD Filesystems , Mark Saad Content-Type: multipart/alternative; boundary="000000000000815acb05edb40c44" X-Rspamd-Queue-Id: 4NCymY6RPJz3mDk X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --000000000000815acb05edb40c44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Take the time to figure out send/recv; it is a killer app of ZFS. Note that your initial sync will have to send the entire filesystem; there is no way to start with an rsync-ed copy due to the nature of send/recv. Also note you cannot modify the receive side and then update the backup; as such you should typically set it to be read-only. (Otherwise you will have to roll back to the synchronized snapshot before updating). You can still recv into a read-only zfs filesystem, as the read-only is a statement of =E2=80=9Cread only through the posix layer=E2=80=9D. - Eric On Thu, Nov 17, 2022 at 8:44 AM andy thomas wrote: > On Thu, 17 Nov 2022, Freddie Cash wrote: > > > Now that you have it working with rsync, you should look into using ZFS > > send/recv as an alternative. You should find it finishes a lot quicker > than > > rsync, although it does require a bit more scripting know-how > (especially if > > you want to use restartable/interruptible transfers, or use a transport > > other than SSH for better throughout). > > ZFS send/recv works "below" the filesystem later today rsync works at. > ZFS > > knows which individual blocks on disk have changed between snapshots an= d > > only transfers those blocks. There's no file comparisons and hash > > computations to work out between the hosts. > > > > Transferring the initial snapshot takes a long time, though, as it has = to > > transfer the entire filesystem across. Transferring individual snapshot= s > > after that takes very little time. It's similar to doing a "full" backu= p, > > and then "incrementals". > > > > When transferring data between ZFS pools with similar filesystem > > hierarchies, you really should consider send/recv. > > Point taken! Three days ago, one of our HPC users who has ~9TB of data > stored on our server decided to rename a subdirectory containing ~4TB of > experimental data stored as many millions of relatively small files withi= n > a lot of subdirectories. As a result, rsync on the destination (mirror) > server is still deleting his old folder and its contents and hasn't even > started mirroring the renamed folder. > > Since our servers have been up for 5.5 years and are both well overdue fo= r > an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a goo= d > opportunity to switch from rsync to ZFS send/recv. I was planning to do > the O/S update over the upcoming Christmas vacation when HPC demand here > traditionally falls to a very low level - I will set up a pair of test > servers in the next day or two, play around with this and get some > experience of this before upgrading the 'live' servers. > > cheers, Andy > > > Typos due to smartphone keyboard. > > > > On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, > > wrote: > > I thought I would report back that changed my rsync options from > > '-Wav > > --delete' to '-av --inplace --no-whole-file --delete' has made a > > significant difference, with mirrored directory sizes on the > > slave server > > now falling and approaching the original sizes on the master. > > The only > > downside is that since whole-file replication is obviously a lot > > faster > > than updating the changed parts of individual files, mirroring > > is now > > taking longer than 24 hours so this will be changed to every few > > days or > > even weekly when more is known about user behaviour on the > > master server. > > > > Andy > > > > On Sun, 13 Nov 2022, Bob Friesenhahn wrote: > > > > > On Sun, 13 Nov 2022, Mark Saad wrote: > > >>> > > >> Bob are you saying when the target is zfs --inplace > > --no-whole-file helps > > >> or just in general when you have > > >> large files ? Also have you tried using --delete-during / > > --delete-after > > >> ? > > > > > > The '-inplace --no-whole-file' updates the file blocks if they > > have changed > > > (comparing the orgin blocks with the existing mirror blocks) > > rather than > > > creating a new copy of the file and moving it into place when > > it is complete. > > > ZFS does not check if data content has been changed while it > > is being written > > > so a write of the same data will result in a fresh allocation > > based on its > > > Copy On Write ("COW") design. Writing a whole new file > > obviously > > > significantly increases the number of blocks which are > > written. Requesting > > > that rsync only write to the file for the blocks which have > > changed reduces > > > the total number of blocks which get written. > > > > > > The above helps quite a lot when using snapshots since then > > fewer blocks are > > > in the snapshots. > > > > > > I have never tried --delete-during so I can't comment on that. > > > > > > Bob > > > -- > > > Bob Friesenhahn > > > bfriesen@simple.dallas.tx.us, > > http://www.simplesystems.org/users/bfriesen/ > > > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > > Public Key, > > http://www.simplesystems.org/users/bfriesen/public-key.txt > > > > > > > > > > --000000000000815acb05edb40c44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Take the time to figure out send/recv; it is a kille= r app of ZFS. Note that your initial sync will have to send the entire file= system; there is no way to start with an rsync-ed copy due to the nature of= send/recv.

Also note yo= u cannot modify the receive side and then update the backup; as such you sh= ould typically set it to be read-only. (Otherwise you will have to roll bac= k to the synchronized snapshot before updating). You can still recv into a = read-only zfs filesystem, as the read-only is a statement of =C2=A0=E2=80= =9Cread only through the posix layer=E2=80=9D.

<= /div>
=C2=A0- Eric

On Thu, Nov 17, 2022 at 8:44 AM and= y thomas <an= dy@time-domain.co.uk> wrote:
On Thu, 17 No= v 2022, Freddie Cash wrote:

> Now that you have it working with rsync, you should look into using ZF= S
> send/recv as an alternative. You should find it finishes a lot quicker= than
> rsync, although it does require a bit more scripting know-how (especia= lly if
> you want to use restartable/interruptible transfers, or use a transpor= t
> other than SSH for better throughout).
> ZFS send/recv works "below" the filesystem later today rsync= works at. ZFS
> knows which individual blocks on disk have changed between snapshots a= nd
> only transfers those blocks. There's no file comparisons and hash<= br> > computations to work out between the hosts.
>
> Transferring the initial snapshot takes a long time, though, as it has= to
> transfer the entire filesystem across. Transferring individual snapsho= ts
> after that takes very little time. It's similar to doing a "f= ull" backup,
> and then "incrementals".
>
> When transferring data between ZFS pools with similar filesystem
> hierarchies, you really should consider send/recv.

Point taken! Three days ago, one of our HPC users who has ~9TB of data
stored on our server decided to rename a subdirectory containing ~4TB of experimental data stored as many millions of relatively small files within =
a lot of subdirectories. As a result, rsync on the destination (mirror) server is still deleting his old folder and its contents and hasn't eve= n
started mirroring the renamed folder.

Since our servers have been up for 5.5 years and are both well overdue for =
an O/S upgrade from FBSD 11.3 to 13.x anyway, I think this would be a good =
opportunity to switch from rsync to ZFS send/recv. I was planning to do the O/S update over the upcoming Christmas vacation when HPC demand here traditionally falls to a very low level - I will set up a pair of test
servers in the next day or two, play around with this and get some
experience of this before upgrading the 'live' servers.

cheers, Andy

> Typos due to smartphone keyboard.
>
> On Thu., Nov. 17, 2022, 12:50 a.m. andy thomas, <andy@time-domain.co.uk> > wrote:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0I thought I would report back that changed m= y rsync options from
>=C2=A0 =C2=A0 =C2=A0 =C2=A0'-Wav
>=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete' to '-av --inplace --no-who= le-file --delete' has made a
>=C2=A0 =C2=A0 =C2=A0 =C2=A0significant difference, with mirrored direct= ory sizes on the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0slave server
>=C2=A0 =C2=A0 =C2=A0 =C2=A0now falling and approaching the original siz= es on the master.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0The only
>=C2=A0 =C2=A0 =C2=A0 =C2=A0downside is that since whole-file replicatio= n is obviously a lot
>=C2=A0 =C2=A0 =C2=A0 =C2=A0faster
>=C2=A0 =C2=A0 =C2=A0 =C2=A0than updating the changed parts of individua= l files, mirroring
>=C2=A0 =C2=A0 =C2=A0 =C2=A0is now
>=C2=A0 =C2=A0 =C2=A0 =C2=A0taking longer than 24 hours so this will be = changed to every few
>=C2=A0 =C2=A0 =C2=A0 =C2=A0days or
>=C2=A0 =C2=A0 =C2=A0 =C2=A0even weekly when more is known about user be= haviour on the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0master server.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0Andy
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0On Sun, 13 Nov 2022, Bob Friesenhahn wrote:<= br> >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> On Sun, 13 Nov 2022, Mark Saad wrote: >=C2=A0 =C2=A0 =C2=A0 =C2=A0>>>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>> Bob are you saying when the target = is zfs --inplace
>=C2=A0 =C2=A0 =C2=A0 =C2=A0--no-whole-file helps
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>> or just in general when you have >=C2=A0 =C2=A0 =C2=A0 =C2=A0>> large files ?=C2=A0 Also have you t= ried using --delete-during /
>=C2=A0 =C2=A0 =C2=A0 =C2=A0--delete-after
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>> ?
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> The '-inplace --no-whole-file' = updates the file blocks if they
>=C2=A0 =C2=A0 =C2=A0 =C2=A0have changed
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> (comparing the orgin blocks with the ex= isting mirror blocks)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0rather than
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> creating a new copy of the file and mov= ing it into place when
>=C2=A0 =C2=A0 =C2=A0 =C2=A0it is complete.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> ZFS does not check if data content has = been changed while it
>=C2=A0 =C2=A0 =C2=A0 =C2=A0is being written
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> so a write of the same data will result= in a fresh allocation
>=C2=A0 =C2=A0 =C2=A0 =C2=A0based on its
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> Copy On Write ("COW") design.= =C2=A0 Writing a whole new file
>=C2=A0 =C2=A0 =C2=A0 =C2=A0obviously
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> significantly increases the number of b= locks which are
>=C2=A0 =C2=A0 =C2=A0 =C2=A0written.=C2=A0 Requesting
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> that rsync only write to the file for t= he blocks which have
>=C2=A0 =C2=A0 =C2=A0 =C2=A0changed reduces
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> the total number of blocks which get wr= itten.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> The above helps quite a lot when using = snapshots since then
>=C2=A0 =C2=A0 =C2=A0 =C2=A0fewer blocks are
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> in the snapshots.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> I have never tried --delete-during so I= can't comment on that.
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> Bob
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> --
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> Bob Friesenhahn
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> bfriesen@simple.dallas.tx.us,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0http://www.simplesystems.= org/users/bfriesen/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> GraphicsMagick Maintainer,=C2=A0 =C2=A0= http://www.GraphicsMagick.org/
>=C2=A0 =C2=A0 =C2=A0 =C2=A0> Public Key,=C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0http:= //www.simplesystems.org/users/bfriesen/public-key.txt
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0>
>
>
--000000000000815acb05edb40c44--