ZFS issues on 13-current snapshot
Matt Churchyard
matt.churchyard at userve.net
Wed Jan 27 09:56:01 UTC 2021
Hello,
I'm testing a 13-current machine for future use as an encrypted offsite backup store. As it's near release I was kind of hoping to get away with using this 13 snapshot for a few months then switch to a RELEASE bootenv when it comes out.
However, I seem to be having a few issues.
First of all I stated noticing that the USED & REFER columns weren't equal for individual datasets. This system so far has simply received a single snapshot of a few datasets, and had readonly set immediately after. Some of them are showing several hundred MB linked to snapshots on datasets that haven't been touched. I'm unable to send further snapshots without forcing a rollback first. Not the end of the world but this isn't right and has never happened on previous ZFS systems. The most I've seen is a few KB because I forgot to set readonly and went into a few directories on a dataset with atime=on.
offsite 446G 6.36T 140K /offsite
[...]
offsite/secure/cms 359M 6.36T 341M /offsite/secure/cms
offsite/secure/cms at 26-01-2021 17.6M - 341M -
offsite/secure/company 225G 6.36T 224G /offsite/secure/company
offsite/secure/company at 25-01-2021 673M - 224G -
offsite/secure is an encrypted dataset using default options.
zfs diff will sit for a while (on small datasets - I gave up trying to run it on anything over a few GB) and eventually output nothing.
root at offsite:/etc # uname -a
FreeBSD offsite.backup 13.0-CURRENT FreeBSD 13.0-CURRENT #0 main-c255641-gf2b794e1e90: Thu Jan 7 06:25:26 UTC 2021 root at releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
root at offsite:/etc # zpool version
zfs-0.8.0-1
zfs-kmod-0.8.0-1
I then thought I would run a scrub just to see if it found any obvious problems.
It started off running fine, estimating about 45-60 minutes for the whole process of scanning 446GB. (This is 4 basic SATA Ironwolf 4TB disks in raidz2)
However it appeared to stall at 19.7%. Eventually it hit 19.71, and does appear to be going up, but at this point looks like it may take a days to complete (currently says 3 hours but it's skewed by the initial fast progress and going up every time I check).
Gstat shows the disks at 100% doing anywhere between 10-50MB/s. (They were hitting anywhere up to 170MB/s to start off with. Obviously this varies when having to seek, but even at the rates currently seen I suspect it should be progressing faster than zpool output shows)
root at offsite:/etc # zpool status
pool: offsite
state: ONLINE
scan: scrub in progress since Wed Jan 27 09:29:50 2021
555G scanned at 201M/s, 182G issued at 65.8M/s, 921G total
0B repaired, 19.71% done, 03:11:51 to go
config:
NAME STATE READ WRITE CKSUM
offsite ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gpt/data-ZGY85VKX ONLINE 0 0 0
gpt/data-ZGY88MRY ONLINE 0 0 0
gpt/data-ZGY88NZJ ONLINE 0 0 0
gpt/data-ZGY88QKF ONLINE 0 0 0
errors: No known data errors
Update: I've probably spent 30+ minutes writing this email and it's reporting a few more GB read but not a single digit in progress percent.
scan: scrub in progress since Wed Jan 27 09:29:50 2021
559G scanned at 142M/s, 182G issued at 46.2M/s, 921G total
0B repaired, 19.71% done, 04:33:08 to go
It doesn't inspire a lot of confidence. ZFS had become pretty rock solid in FreeBSD in recent years and I have many systems running it. This should have the most efficient scrub code to date and yet is currently taking about an hour to progress 0.01% on a new system with a fraction of the data it will hold and 0% fragmentation.
As it stands at the moment, I will likely scrap this attempt and retry with FreeBSD 12.
Regards,
Matt Churchyard
More information about the freebsd-fs
mailing list