GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool)

Maciej Suszko maciej at suszko.eu
Mon Apr 25 08:08:05 UTC 2016


On Mon, 25 Apr 2016 01:00:45 -0400
"Michael B. Eichorn" <ike at michaeleichorn.com> wrote:

> I just ran into something rather unexpected. I have a pool consisting
> of a mirrored pair of geli encrypted partitions on WD Red 3TB disks.
> 
> The machine is running 10.3-RELEASE, the root zpool was setup with
> GELI encryption from the installer, the pool that is acting up was
> setup per the handbook.
> 
> See the below timeline for what happened, tldr: zpool scrub destroyed
> the eli devices, my attempt to recreate the eli device earned me a
> ZFS-8000-8A critical error (corrupted data).
> 
> All of the errors reported with zpool status -v are metadata and not
> regualar files, but as I now have permanent metadata errors I am
> looking for guidance as to:
> 
> 1) Is it safe to keep running the pool as-is for a day or two or am I
> risking data corruption?
> 
> 2) It would be much much faster to copy the data to another pool than
> recreate the pool and copy the data back, rather than restore from
> backups, am I looking at any potential data loss if I do this?
> 
> 3) What infomation would be useful to generate for the PR, the error
> is reproducable so what should be tried before I nuke the pool?
> 
> Thanks,
> Ike
> 
> -- TIMELINE --
> 
> I had just noticed that I had failed to enable the zpool scrub
> periodic on this machine. So I began to run zpool scrub by hand. It
> succeeded for the root pool which is also geli encrypted, but when I
> ran it against my primary data pool I encountered:
> 
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
> close.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
> close.
> 
> And the scrub failed to initialize (command never returned to the
> shell).
> 
> I then performed a reboot, which suceeded and brought everything up as
> normal. I then attempted to scrub the pool again. This time I only
> lost one of the partitions:
> 
> Apr 24 23:37:34 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
> Apr 24 23:37:34 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
> close.
> 
> I then performed a geli attach and zpool online, which onlined the
> disk that was offline and offlined the disk that was online (EEEK!):
> 
> Apr 24 23:38:28 terra kernel: GEOM_ELI: Device ada2p1.eli created.
> Apr 24 23:38:28 terra kernel: GEOM_ELI: Encryption: AES-XTS 256
> Apr 24 23:38:28 terra kernel: GEOM_ELI:     Crypto: hardware
> Apr 24 23:41:05 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
> Apr 24 23:41:05 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
> close.
> Apr 24 23:41:05 terra devd: Executing 'logger -p kern.notice -t ZFS
> 'vdev state changed, pool_guid=5890893416839487107
> vdev_guid=17504861086892353515''
> Apr 24 23:41:05 terra ZFS: vdev state changed,
> pool_guid=5890893416839487107 vdev_guid=17504861086892353515
> 
> I immediately rebooted and both disks came back and resilvered, with
> permanent metadata errors
> 
> -- END TIMELINE --

Hi,

Configure your geli devices not to autodetach on last close...
something like this in your rc.conf should work:

geli_ada2p1_autodetach="NO"
geli_ada3p1_autodetach="NO"
-- 
regards, Maciej Suszko.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20160425/80f1cc6d/attachment.sig>


More information about the freebsd-fs mailing list