GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool)

Michael B. Eichorn ike at michaeleichorn.com
Mon Apr 25 04:59:27 UTC 2016


I just ran into something rather unexpected. I have a pool consisting
of a mirrored pair of geli encrypted partitions on WD Red 3TB disks.

The machine is running 10.3-RELEASE, the root zpool was setup with GELI
encryption from the installer, the pool that is acting up was setup per
the handbook.

See the below timeline for what happened, tldr: zpool scrub destroyed
the eli devices, my attempt to recreate the eli device earned me a
ZFS-8000-8A critical error (corrupted data).

All of the errors reported with zpool status -v are metadata and not
regualar files, but as I now have permanent metadata errors I am
looking for guidance as to:

1) Is it safe to keep running the pool as-is for a day or two or am I
risking data corruption?

2) It would be much much faster to copy the data to another pool than
recreate the pool and copy the data back, rather than restore from
backups, am I looking at any potential data loss if I do this?

3) What infomation would be useful to generate for the PR, the error is
reproducable so what should be tried before I nuke the pool?

Thanks,
Ike

-- TIMELINE --

I had just noticed that I had failed to enable the zpool scrub periodic
on this machine. So I began to run zpool scrub by hand. It succeeded
for the root pool which is also geli encrypted, but when I ran it
against my primary data pool I encountered:

Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
close.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
close.

And the scrub failed to initialize (command never returned to the
shell).

I then performed a reboot, which suceeded and brought everything up as
normal. I then attempted to scrub the pool again. This time I only lost
one of the partitions:

Apr 24 23:37:34 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
Apr 24 23:37:34 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
close.

I then performed a geli attach and zpool online, which onlined the disk
that was offline and offlined the disk that was online (EEEK!):

Apr 24 23:38:28 terra kernel: GEOM_ELI: Device ada2p1.eli created.
Apr 24 23:38:28 terra kernel: GEOM_ELI: Encryption: AES-XTS 256
Apr 24 23:38:28 terra kernel: GEOM_ELI:     Crypto: hardware
Apr 24 23:41:05 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
Apr 24 23:41:05 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
close.
Apr 24 23:41:05 terra devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=5890893416839487107
vdev_guid=17504861086892353515''
Apr 24 23:41:05 terra ZFS: vdev state changed,
pool_guid=5890893416839487107 vdev_guid=17504861086892353515

I immediately rebooted and both disks came back and resilvered, with
permanent metadata errors

-- END TIMELINE --
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5729 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20160425/ef755868/attachment.bin>


More information about the freebsd-fs mailing list