zfs q regarding backup strategy

Sun Oct 3 08:37:05 UTC 2021

On 10/2/21 22:33, Steve O'Hara-Smith wrote:
> On Sat, 2 Oct 2021 15:09:23 -0700
> David Christensen <dpchrist at holgerdanske.com> wrote:
> 
>> Assuming I can create a ZFS pool from one or more ZFS volume datasets
>> (?), here is an idea:
> 
> 	I don't think you can create a pool on top of zvols, I couldn't get
> it to work last time I tried.
> 
>> 1.  Create a large 'archive' pool.  Say, 10 TB.
>>
>> 2.  Within the archive pool, create many small volumes.  Say, 100
>> volumes of 100 GB each.
> 
> 	Why not just split the drives into 100GB partitions with gpart
> rather than attempting to nest zpools ?

The idea was to do redundancy (mirror, raidzN), caching, etc, once at 
the bottom level, rather than multiple times (once for each 
archive-source pool).  But if it is not possible to build second-level 
ZFS pools on top of ZFS volumes on top of a first-level ZFS pool, then 
GPT partitions and doing it the hard way should work.  But first, I 
would want to research GEOM and see if it can do RAID (I suspect the 
answer is yes).

>> 3.  For each source, create a 'archive-source' pool using the 'zpool
>> create -R' option and one or more volumes as required for capacity.
> 
> 	Also record the root mount for use at boot time.

Yes.  Figuring out where to put this, and the other settings/ data/ 
logs/ whatever, will be important to usability and to failure survival/ 
recovery.

>> 4.  From the archive server, replicate datasets from their respective
>> source pools to their corresponding archive-source pools using the 'zfs
>> receive -u' option.
> 
> 	Once you have altroot working then you want the dataset mounted -
> read only though.
> 
>> 5.  Upon receipt of a replica dataset, save the 'canmount' property (for
>> restore).  If it is 'on', set it to 'notauto'.
> 
> 	No need. >
>> 6.  Upon receipt of a replica dataset, save the 'readonly' property (for
>> restore).  If it is 'off', set it to 'on'.
> 
> 	Yes.

I suppose the 'zfs receive -u' is overkill if 'altroot' is set properly 
on the pool, but I am not adverse to another layer of safety when doing 
sysadmin scripting.  I also prefer having explicit control over if/when 
the replica is mounted.

Most of the prior ideas are for the first full replication job of each 
dataset.  More research/ testing/ thinking is needed for ongoing 
incremental replication jobs.

> 	It is also necessary to check to see whether the target pool has
> enough space and if not throw a few more logs on the fire.

Yes -- that and probably a dozen more use-cases/ features to get to a 
minimal, fully-automatic implementation.

>> The most obvious problem is if the system crashes between #4 and #5.  On
>> subsequent boot, AIUI all previously active pools will be automatically
>> imported (e.g. without 'altroot') and all datasets with 'canmount=on'
>> will be mounted (according to 'mountpoint').  If two or more datasets
>> are mounted at the same mount point, the results could be bad.
>> 'bootpool' and 'zroot' are likely cases.
> 
> 	This is where the boot script to restore the altroot settings comes
> in - but it has to run before zfs attempts the mounts.

Do you have any idea if and what hooks are available during system boot 
and ZFS setup?

STFW I see the following, but will need more information to affect ZFS 
during boot:

https://openzfs.readthedocs.io/en/latest/boot-process.html

https://www.unix.com/man-page/freebsd/8/zfsloader/

David