Re: zfs snapshot corruption when using encryption

From: Peter Eriksson <pen_at_lysator.liu.se>
Date: Wed, 20 Nov 2024 08:15:55 UTC
I’m seeing something similar on one of our systems - the one system where I’ve just now started trying to use ZFS native encryption.

Setup:
FreeBSD 13.4-RELEASE-p1, 512GB RAM

3 Zpools: 
  zroot - mirror of two SSD drives
  ENCRYPTED - ZFS over GELI-encrypted SAS 10TB drives
  SEKUR01D1 - ZFS over SAS 18TB drives with ZFS encryption enabled for individual filesystems

- ZFS snapshots are taken every hour of the ENCRYPTED zpool.
- zfs send is being done on some filesystem on the ENCRYPTED spool
- A big “cp -a” (about 70TB of files) of data from zfs filesystems in ENCRYPTED to SEKUR01D1 filesystems is running.

CKSUM errors pop up in zroot!

Fixed some errors yesterday, ran ‘zpool scrub’ & ‘zpool clear’ and got a clean bill of health:

# zpool status -v zroot
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:07:15 with 0 errors on Tue Nov 19 21:46:36 2024
config:

	NAME                                STATE     READ WRITE CKSUM
	zroot                               ONLINE       0     0     0
	  mirror-0                          ONLINE       0     0     0
	    ada0p4                          ONLINE       0     0     0
	    diskid/DISK-PHDW817002MK150Ap4  ONLINE       0     0     0

errors: No known data errors

This morning:

# zpool scrub zroot

# zpool status -v zroot
  pool: zroot
 state: ONLINE
  scan: scrub in progress since Wed Nov 20 08:11:31 2024
	19.4G scanned at 6.48G/s, 772K issued at 257K/s, 49.3G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                STATE     READ WRITE CKSUM
	zroot                               ONLINE       0     0     0
	  mirror-0                          ONLINE       0     0     0
	    ada0p4                          ONLINE       0     0     0
	    diskid/DISK-PHDW817002MK150Ap4  ONLINE       0     0     0

errors: No known data errors

# zpool status -v zroot
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:07:20 with 1 errors on Wed Nov 20 08:18:51 2024
config:

	NAME                                STATE     READ WRITE CKSUM
	zroot                               ONLINE       0     0     0
	  mirror-0                          ONLINE       0     0     0
	    ada0p4                          ONLINE       0     0     2
	    diskid/DISK-PHDW817002MK150Ap4  ONLINE       0     0     2

errors: Permanent errors have been detected in the following files:

        /var/audit/20241119235400.20241120000543


Snapshots & zfs send only being done on the “ENCRYPTED” zpool, not on “zroot” or “SEKUR01D1”. Ie not on the zpool with the
Zfs-native-encrypted filesystems.

Not 100% sure it is related but something is fishy. This is a server that has been running fine with GELI-encrypted disks for many years now… 

- Peter

> On 9 Nov 2024, at 15:53, Palle Girgensohn <girgen@FreeBSD.org> wrote:
> 
> 
> 
>> 9 nov. 2024 kl. 02:59 skrev void <void@f-m.fm>:
>> 
>> % zfs version
> 
> Ah, of course.
> 
> $ zfs version 
> zfs-2.2.4-FreeBSD_g256659204
> zfs-kmod-2.2.4-FreeBSD_g256659204
> 
> Palle
>