ZFS License and Future

Roland Smith rsmith at xs4all.nl
Mon Nov 8 18:38:29 UTC 2010

On Mon, Nov 08, 2010 at 05:08:33PM +0100, Svein Skogen (Listmail account) wrote:
> >>> The GEOM_ELI class provides optional authentication/checksumming. See
> >>> geli(8),
> >>> especially the -a option.
> >> im not sure on whether that you be a viable replacement, as it has to be
> >> a fairly good checksum to avoid clashes, whilst also being quick so it
> >> doesnt adversly affect disk performance. Also what does it do if it
> >> detects the checksum doesnt match etc?

Personally I've never enabled the checksumming because, and I quote from
geli(8), "This will reduce size of available storage and also reduce

> > Good point. Geli uses a crypto standard hash (HMAC/SHA256 is
> > recommended) as it's all about authentication in the face of potentially
> > malicious attack, and that's fairly expensive. ZFS by default uses the
> > fletcher2 (= fletcher32) hash, which is simple and fast, as it's used to
> > make sure that hardware hasn't accidentally mangled your data.

But with geli(8) one can choose between HMAC/MD5, HMAC/SHA1, HMAC/RIPEMD160,
HMAC/SHA256, HMAC/SHA384 and HMAC/SHA512. With the recommeded HMAC/SHA256
you'll loose 11% of the provider's capacity. Presumably MD5 is fastest while
SHA512 is the slowest, while MD5 has a higher chance of collisions.

> But it's still not capable of true forward-error-correction. If we are
> to embark upon creating a new solution, using something that is cheap
> for "normal cases" but can still be used (albeit more expensively) for
> error recovery would (imho) be better. Even if that means we get less
> net storage out of the gross pool (it could perhaps be configurable?)

I'm not sure what you mean by "true forward-error-correction". But if you want
to make _really sure_ that a spinning disk hasn't mangled the data you should:

- Calculate a checksum of a data block in memory.
- Write the data block to disk (with write caching disabled to make as sure as
  possible that the data is on disk when the write finishes. That is a _huge_
  performance penalty)
- Read the data back from disk (and not from the cache!) and compare with the
  original checksum.
- If the read checksum control fails, mark the block as bad and repeat at
  another location

Personally I don't see how this is going to be fast without compromising on
correctness. If you keep the disk write cache enabled to the best of my
knowledge there is no way for the OS to know for sure that the data is
actually on the plates, so the read-back and comparison stage might not mean

And for SSDs we might need another type of filesystem entirely. Some concepts
in UFS2 (like e.g. cylinder groups) pretty much useless on SSDs.

R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20101108/3bc0183b/attachment.pgp

More information about the freebsd-questions mailing list