GELI reliability

Terje Elde terje at elde.net
Fri Feb 25 11:07:14 UTC 2011


Hi,

I'm curious about GELIs theoretical behavior when faced with errors, and also any experience anyone might have.

As an example, if I run ZFS with raidz over X drives, then the zpool should have no issue surviving the complete loss of a full disk.  Also, the familiar "FAILURE - READ_DMA" or READ_DMA48 errors from a disk having a bad day should also be no issue, certainly not an issue that would result in crash, or worse still, loss of data.

What I'm wondering is how much worse off the data would be if I were to slide inn a GELI layer between the physical drives, and the zpool.

That is, if I use GELI directly on the individual drives ( /dev/ada0, /dev/ada1 etc), then make a raidz pool on top of the .eli devices ( /dev/ada0.eli, /dev/ada1.eli etc).

For READ_DMA type errors, I suppose GELI could just forward the same errors up the stack, and that'd be that, the errors wouldn't be any more severe than what I'd have anyway?

One exception could be if GELI sector size is larger than disk sector size.  Not being too familiar with GELIs internal workings, I'm not sure that has to be the case though.  GELI sectors have a new IV pr. sector, but the crypto itself is still done in 128 or 256 bit blocks, so given a single faulty disk sector, the rest of the GELI sector could still be read and decrypted?  Or are entire GELI sectors faulted if a (smaller) underlaying sector is unreadable?

And finally, what if an entire drive dies a cruel and horribly death, all of it's data returning to the large bit bucket in the sky?  Would GELI simply relay the same errors upstack to ZFS, so ZFS would be able to handle it as well as it would have without GELI?


I've used ZFS over GELI in the past, but never had any hardware issues to see how it plays out.

I'm considering deploying it for more stuff now, and reliability wise, from what I know, I could loose very little by using GELI, or it could be time to buy napkins, because the risk of a grown man crying while trying to mop up spilled bits from the floor is increased significantly.

(backups should avoid the need for tears though, besides, I don't have a mop)

Terje



More information about the freebsd-questions mailing list