ZFS issues on 13-current snapshot

Wed Jan 27 09:56:01 UTC 2021

Hello,

I'm testing a 13-current machine for future use as an encrypted offsite backup store. As it's near release I was kind of hoping to get away with using this 13 snapshot for a few months then switch to a RELEASE bootenv when it comes out.

However, I seem to be having a few issues.
First of all I stated noticing that the USED & REFER columns weren't equal for individual datasets. This system so far has simply received a single snapshot of a few datasets, and had readonly set immediately after. Some of them are showing several hundred MB linked to snapshots on datasets that haven't been touched. I'm unable to send further snapshots without forcing a rollback first. Not the end of the world but this isn't right and has never happened on previous ZFS systems. The most I've seen is a few KB because I forgot to set readonly and went into a few directories on a dataset with atime=on.

offsite   446G  6.36T      140K  /offsite
[...]
offsite/secure/cms                                                359M  6.36T      341M  /offsite/secure/cms
offsite/secure/cms at 26-01-2021                                    17.6M      -      341M  -
offsite/secure/company                                            225G  6.36T      224G  /offsite/secure/company
offsite/secure/company at 25-01-2021                                 673M      -      224G  -

offsite/secure is an encrypted dataset using default options.
zfs diff will sit for a while (on small datasets - I gave up trying to run it on anything over a few GB) and eventually output nothing.

root at offsite:/etc # uname -a
FreeBSD offsite.backup 13.0-CURRENT FreeBSD 13.0-CURRENT #0 main-c255641-gf2b794e1e90: Thu Jan  7 06:25:26 UTC 2021     root at releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
root at offsite:/etc # zpool version
zfs-0.8.0-1
zfs-kmod-0.8.0-1

I then thought I would run a scrub just to see if it found any obvious problems.
It started off running fine, estimating about 45-60 minutes for the whole process of scanning 446GB. (This is 4 basic SATA Ironwolf 4TB disks in raidz2)
However it appeared to stall at 19.7%. Eventually it hit 19.71, and does appear to be going up, but at this point looks like it may take a days to complete (currently says 3 hours but it's skewed by the initial fast progress and going up every time I check).
Gstat shows the disks at 100% doing anywhere between 10-50MB/s. (They were hitting anywhere up to 170MB/s to start off with. Obviously this varies when having to seek, but even at the rates currently seen I suspect it should be progressing faster than zpool output shows)

root at offsite:/etc # zpool status
  pool: offsite
state: ONLINE
  scan: scrub in progress since Wed Jan 27 09:29:50 2021
        555G scanned at 201M/s, 182G issued at 65.8M/s, 921G total
        0B repaired, 19.71% done, 03:11:51 to go
config:

        NAME                   STATE     READ WRITE CKSUM
        offsite                ONLINE       0     0     0
          raidz2-0             ONLINE       0     0     0
            gpt/data-ZGY85VKX  ONLINE       0     0     0
            gpt/data-ZGY88MRY  ONLINE       0     0     0
            gpt/data-ZGY88NZJ  ONLINE       0     0     0
            gpt/data-ZGY88QKF  ONLINE       0     0     0

errors: No known data errors

Update: I've probably spent 30+ minutes writing this email and it's reporting a few more GB read but not a single digit in progress percent.

  scan: scrub in progress since Wed Jan 27 09:29:50 2021
        559G scanned at 142M/s, 182G issued at 46.2M/s, 921G total
        0B repaired, 19.71% done, 04:33:08 to go

It doesn't inspire a lot of confidence. ZFS had become pretty rock solid in FreeBSD in recent years and I have many systems running it. This should have the most efficient scrub code to date and yet is currently taking about an hour to progress 0.01% on a new system with a fraction of the data it will hold and 0% fragmentation.

As it stands at the moment, I will likely scrap this attempt and retry with FreeBSD 12.

Regards,
Matt Churchyard