svn commit: r43697 - projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs

Fri Jan 31 19:37:25 UTC 2014

On Thu, 30 Jan 2014, Benedict Reuschling wrote:

> Log:
>  Add a section about basic zfs send and receive.  This is based on an older
>  example and might need updates to represent the current zfs version we have.
>  It shows how to send zfs data streams locally and remote (via SSH) with
>  example commands and output.
>
> Modified:
>  projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml
>
> Modified: projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml
> ==============================================================================
> --- projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml	Thu Jan 30 18:17:31 2014	(r43696)
> +++ projects/zfsupdate-201307/en_US.ISO8859-1/books/handbook/zfs/chapter.xml	Thu Jan 30 19:00:09 2014	(r43697)
> @@ -1250,7 +1250,243 @@ tank    custom:costcenter  -
>     <sect2 xml:id="zfs-zfs-send">
>       <title>ZFS Replication</title>
>
> -      <para></para>
> +      <para>Keeping the data on a single pool in one location exposes
> +	it to risks like theft, natural and human desasters.  Keeping

"disasters".

> +	regular backups of the entire pool is vital when data needs to
> +	be restored.  ZFS provides a built-in serialization feature
> +	that can send a stream representation of the data to standard
> +	output.  Using this technique, it is possible to not only
> +	store the data on another pool connected to the local system,
> +	but also to send it over a network to another system that runs
> +	ZFS.  To achieve this replication, ZFS uses the filesystem

s/the//

> +	snapshots (see the section on <link
> +	  linkend="zfs-zfs-snapshot">ZFS snapshots</link> on how they

s/on/for/

> +	work) to send them from one location to another.  The commands
> +	for this operation are <literal>zfs send</literal> and
> +	<literal>zfs receive</literal>, respectively.</para>
> +
> +      <para>The following examples will demonstrate the functionality
> +	of ZFS replication using these two pools:</para>
> +
> +      <screen>&prompt.root; <userinput>zpool list</userinput>
> +NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> +backup  960M    77K   896M     0%  1.00x  ONLINE  -
> +mypool  984M  43.7M   940M     4%  1.00x  ONLINE  -</screen>
> +
> +      <para>The pool named <replaceable>mypool</replaceable> is the
> +	primary pool where data is written to and read from on a
> +	regular basis.  A second pool,
> +	<replaceable>backup</replaceable> is used as a standby in case
> +	the primary pool becomes offline.  Note that this is not done
> +	automatically by ZFS, but rather done by a system
> +	administrator in case it is needed.  First, a snapshot is
> +	created on <replaceable>mypool</replaceable> to have a backup

I don't really think that "backup" is the best characterization of what is 
going on, here.  It's really just a snapshot, but we used that word 
already.  Maybe 'copy'?

> +	of the current state of the data to send to the pool
> +	<replaceable>backup</replaceable>.</para>
> +
> +      <screen>&prompt.root; <userinput>zfs snapshot <replaceable>mypool</replaceable>@<replaceable>backup1</replaceable></userinput>
> +&prompt.root; <userinput>zfs list -t snapshot</userinput>
> +NAME                    USED  AVAIL  REFER  MOUNTPOINT
> +mypool at backup1             0      -  43.6M  -</screen>
> +
> +      <para>Now that a snapshot exists, <command>zfs send</command>
> +	can be used to create a stream representing the contents of
> +	the snapshot locally or remote to another pool.  The stream

"remotely", I think?

> +	must be written to the standard output, otherwise ZFS will
> +	produce an error like in this example:</para>
> +
> +      <screen>&prompt.root; <userinput>zfs send <replaceable>mypool</replaceable>@<replaceable>backup1</replaceable></userinput>
> +Error: Stream can not be written to a terminal.
> +You must redirect standard output.</screen>
> +
> +      <para>The correct way to use <command>zfs send</command> is to
> +	redirect it to a location like the mounted backup pool.
> +	Afterwards, that pool should have the size of the snapshot
> +	allocated, which means all the data contained in the snapshot
> +	was stored on the backup pool.</para>
> +
> +      <screen>&prompt.root; <userinput>zfs send <replaceable>mypool</replaceable>@<replaceable>backup1</replaceable> > <replaceable>/backup/backup1</replaceable></userinput>
> +&prompt.root; <userinput>zpool list</userinput>
> +NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> +backup  960M  63.7M   896M     6%  1.00x  ONLINE  -
> +mypool  984M  43.7M   940M     4%  1.00x  ONLINE  -</screen>
> +
> +      <para>The <command>zfs send</command> transferred all the data
> +	in the snapshot called <replaceable>backup1</replaceable> to
> +	the pool named <replaceable>backup</replaceable>.  Creating
> +	and sending these snapshots could be done automatically by a
> +	cron job.</para>
> +
> +      <sect3 xml:id="zfs-send-incremental">
> +	<title>ZFS Incremental Backups</title>
> +
> +	<para>Another feature of <command>zfs send</command> is that
> +	  it can determine the difference between two snapshots to
> +	  only send what has changed between the two.  This results in
> +	  saving disk space and time for the transfer to another pool.
> +	  The following example demonstrates this:</para>

Would just "For example:" suffice here?  It is more concise.

> +
> +	<screen>&prompt.root; <userinput>zfs snapshot <replaceable>mypool</replaceable>@<replaceable>backup2</replaceable></userinput>
> +&prompt.root; <userinput>zfs list -t snapshot</userinput>
> +NAME                    USED  AVAIL  REFER  MOUNTPOINT
> +mypool at backup1         5.72M      -  43.6M  -
> +mypool at backup2             0      -  44.1M  -
> +&prompt.root; <userinput>zpool list</userinput>
> +NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> +backup  960M  61.7M   898M     6%  1.00x  ONLINE  -
> +mypool  960M  50.2M   910M     5%  1.00x  ONLINE  -</screen>
> +
> +	<para>A second snapshot called
> +	  <replaceable>backup2</replaceable> was created.  This second
> +	  snapshot contains only the changes on the ZFS filesystem
> +	  between now and the last snapshot,
> +	  <replaceable>backup1</replaceable>.  Using the
> +	  <literal>-i</literal> flag to <command>zfs send</command>
> +	  and providing both snapshots, an incremental snapshot can be
> +	  transferred, containing only the data that has
> +	  changed.</para>
> +
> +	<screen>&prompt.root; <userinput>zfs send -i <replaceable>mypool</replaceable>@<replaceable>backup1</replaceable> <replaceable>mypool</replaceable>@<replaceable>backup2</replaceable> > <replaceable>/backup/incremental</replaceable></userinput>
> +&prompt.root; <userinput>zpool list</userinput>
> +NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> +backup  960M  80.8M   879M     8%  1.00x  ONLINE  -
> +mypool  960M  50.2M   910M     5%  1.00x  ONLINE  -
> +&prompt.root; <userinput>ls -lh /backup</userinput>
> +total 82247
> +drwxr-xr-x     1 root   wheel      61M Dec  3 11:36 backup1
> +drwxr-xr-x     1 root   wheel      18M Dec  3 11:36 incremental</screen>
> +
> +	<para>The incremental stream was successfully transferred and
> +	  the file on disk is smaller than any of the two snapshots
> +	  <replaceable>backup1</replaceable> or
> +	  <replaceable>backup2</replaceable>.  This shows that it only
> +	  contains the differences, which is much faster to transfer
> +	  and saves disk space by not copying the complete pool each
> +	  time.  This is useful when having to rely on slow networks
> +	  or when costs per transferred byte have to be
> +	  considered.</para>
> +      </sect3>
> +
> +      <sect3 xml:id="zfs-send-recv">
> +	<title>Receiving ZFS Data Streams</title>
> +
> +	<para>Up until now, only the data streams in binary form were
> +	  sent to other pools.  To get to the actual data contained in
> +	  those streams, the reverse operation of <command>zfs
> +	    send</command> has to be used to transform the streams
> +	  back into files and directories.  The command is called
> +	  <command>zfs receive</command> and has also a short version:
> +	  <command>zfs recv</command>.  The example below combines
> +	  <command>zfs send</command> and <command>zfs
> +	    receive</command> using a pipe to copy the data from one
> +	  pool to another.  This way, the data can be used directly on
> +	  the receiving pool after the transfer is complete.</para>
> +
> +	<screen>&prompt.root; <userinput>zfs send <replaceable>mypool</replaceable>@<replaceable>backup1</replaceable> | zfs receive <replaceable>backup/backup1</replaceable></userinput>
> +&prompt.root; <userinput>ls -lh /backup</userinput>
> +total 431
> +drwxr-xr-x     4219 root   wheel      4.1k Dec  3 11:34 backup1</screen>
> +
> +	<para>The directory <replaceable>backup1</replaceable> does
> +	  contain all the data, which were part of the snapshot of the
> +	  same name.  Since this originally was a complete filesystem
> +	  snapshot, the listing of all ZFS filesystems for this pool
> +	  is also updated and shows the
> +	  <replaceable>backup1</replaceable> entry.</para>
> +
> +	<screen>&prompt.root; <userinput>zfs list</userinput>
> +NAME                    USED  AVAIL  REFER  MOUNTPOINT
> +backup                 43.7M   884M    32K  /backup
> +backup/backup1         43.5M   884M  43.5M  /backup/backup1
> +mypool                 50.0M   878M  44.1M  /mypool</screen>
> +
> +	<para>A new filesystem, <replaceable>backup1</replaceable> is
> +	  available and has the same size as the snapshot it was
> +	  created from.  It is up to the user to decide whether the
> +	  streams should be transformed back into filesystems directly
> +	  to have a cold-standby for emergencies or to just keep the
> +	  streams and transform them later when required.  Sending and
> +	  receiving can be automated so that regular backups are
> +	  created on a second pool for backup purposes.</para>
> +      </sect3>
> +
> +      <sect3 xml:id="zfs-send-ssh">
> +	<title>Sending Encrypted Backups over SSH</title>
> +
> +	<para>Although sending streams to another system over the
> +	  network is a good way to keep a remote backup, it does come
> +	  with a drawback.  All the data sent over the network link is
> +	  not encrypted, allowing anyone to intercept and transform
> +	  the streams back into data without the knowledge of the
> +	  sending user.  This is an unacceptable situation, especially
> +	  when sending the streams over the internet to a remote host
> +	  with multiple hops in between where such malicious data
> +	  collection can occur.  Fortunately, there is a solution
> +	  available to the problem that does not require the
> +	  encryption of the data on the pool itself.  To make sure the
> +	  network connection between both systems is securely
> +	  encrypted, <application>SSH</application> can be used.
> +	  Since ZFS only requires the stream to be redirected from
> +	  standard output, it is relatively easy to pipe it through
> +	  SSH.</para>

This paragraph seems rather wordy, though I don't think I have any 
concrete suggestions at the moment.

> +
> +	<para>A few settings and security precautions have to be made
> +	  before this can be done.  Since this chapter is about ZFS
> +	  and not about configuring SSH, it only lists the things
> +	  required to perform the encrypted <command>zfs
> +	  send</command> operation.  The following settings should
> +	  be made:</para>
> +
> +	<itemizedlist>
> +	  <listitem>
> +	    <para>Passwordless SSH access between sending and
> +	      receiving host using SSH keys</para>
> +	  </listitem>
> +
> +	  <listitem>
> +	    <para>The <literal>root</literal> user needs to be able to
> +	      log into the receiving system because only that user can
> +	      send streams from the pool.  SSH should be configured so
> +	      that <literal>root</literal> can only execute
> +	      <command>zfs recv</command> and nothing else to prevent
> +	      users that might have hijacked this account from doing
> +	      any harm on the system.</para>

This paragraph is a little confusing about what happens on the sending and 
receiving systems.  (For example, at first I was confused by the first 
sentence, thinking that it was saying that the receiving system would be 
sending streams from the pool.)  Do both the send and receive have to 
happen as root on the respective machines?  I also think that the 
restriction to 'zfs recv' should apply only to the particular ssh key 
which is doing the automated backups; it would be absurd to prevent root 
login from one server to another just because there is a backup 
relationship in place.

> +	  </listitem>
> +	</itemizedlist>
> +
> +	<para>After these security measures have been put into place
> +	  and <literal>root</literal> can connect passwordless via SSH

I think that "via passwordless SSH" is the more conventional phrasing.
Also, do we want markup around SSH?

-Ben

> +	  to the receiving system, the encrypted stream can be sent
> +	  using the following commands:</para>
> +
> +	<screen>&prompt.root; <userinput>zfs snapshot -r <replaceable>mypool/home</replaceable>@<replaceable>monday</replaceable></userinput>
> +&prompt.root; <userinput>zfs send -R <replaceable>mypool/home</replaceable>@<replaceable>monday</replaceable> | ssh <replaceable>backuphost</replaceable> zfs recv -dvu <replaceable>backuppool</replaceable></userinput></screen>