unusual ZFS issue

From: Lexi Winter <lexi_at_le-fay.org>
Date: Thu, 14 Dec 2023 21:17:06 UTC
hi list,

i’ve just hit this ZFS error:

# zfs list -rt snapshot data/vm/media/disk1
cannot iterate filesystems: I/O error
NAME                                                       USED  AVAIL  REFER  MOUNTPOINT
data/vm/media/disk1@autosnap_2023-12-13_12:00:00_hourly      0B      -  6.42G  -
data/vm/media/disk1@autosnap_2023-12-14_10:16:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_11:17:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_12:04:00_monthly     0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_12:15:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_13:14:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_14:38:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_15:11:00_hourly      0B      -  6.46G  -
data/vm/media/disk1@autosnap_2023-12-14_17:12:00_hourly    316K      -  6.47G  -
data/vm/media/disk1@autosnap_2023-12-14_17:29:00_daily    2.70M      -  6.47G  -

the pool itself also reports an error:

# zpool status -v
  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Thu Dec 14 18:58:21 2023
	11.5T / 18.8T scanned at 1.46G/s, 6.25T / 18.8T issued at 809M/s
	0B repaired, 33.29% done, 04:30:20 to go
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    da4p1   ONLINE       0     0     0
	    da6p1   ONLINE       0     0     0
	    da5p1   ONLINE       0     0     0
	    da7p1   ONLINE       0     0     0
	    da1p1   ONLINE       0     0     0
	    da0p1   ONLINE       0     0     0
	    da3p1   ONLINE       0     0     0
	    da2p1   ONLINE       0     0     0
	logs
	  mirror-2  ONLINE       0     0     0
	    ada0p4  ONLINE       0     0     0
	    ada1p4  ONLINE       0     0     0
	cache
	  ada1p5    ONLINE       0     0     0
	  ada0p5    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

(it doesn’t list any files, the output ends there.)

my assumption is that this indicates some sort of metadata corruption issue, but i can’t find anything that might have caused it.  none of the disks report any errors, and while all the disks are on the same SAS controller, i would have expected controller errors to be flagged as CKSUM errors.

my best guess is that this might be caused by a CPU or memory issue, but the system has ECC memory and hasn’t reported any issues.

- has anyone else encountered anything like this?

- i’m a bit worried that if i reboot, the system won’t be able to re-import the pool due to the “cannot iterate filesystems” errors; is that a concern?

the system is running:

FreeBSD hemlock.eden.le-fay.org 14.0-RELEASE-p2 FreeBSD 14.0-RELEASE-p2 #4 releng/14.0-n265396-06497fbd52e2: Fri Dec  8 06:14:12 GMT 2023     root@hemlock.eden.le-fay.org:/data/src/obj/data/src/releng/14.0/amd64.amd64/sys/HEMLOCK amd64

	thanks.