ZFS...

Alan Somers asomers at freebsd.org
Tue Apr 30 14:27:20 UTC 2019


On Tue, Apr 30, 2019 at 8:13 AM Michelle Sullivan <michelle at sorbs.net> wrote:
>
>
>
> Michelle Sullivan
> http://www.mhix.org/
> Sent from my iPad
>
> > On 01 May 2019, at 00:01, Alan Somers <asomers at freebsd.org> wrote:
> >
> >>>
> >>> Unfortunately however there is also cache memory on most modern hard
> >>> drives, most of the time (unless you explicitly shut it off) it's on for
> >>> write caching, and it'll nail you too.  Oh, and it's never, in my
> >>> experience, ECC.
> >
> > Fortunately, ZFS never sends non-checksummed data to the hard drive.
> > So an error in the hard drive's cache ram will usually get detected by
> > the ZFS checksum.
>
> True, but a drive losing power mid write will ensure the checksum doesn’t match the data (even if it is written before the data)... you need to ensure all the data and the checksum is written before drive power down.. and in the event of unexpected hard power fail, you can’t guarantee this.  Battery backup in the controller that has a write cache and re-writes the last few writes on power restore on the otherhand will save you.. which is why the other machine at my disposal hasn’t failed to date.

No, ZFS was designed from the start to handle exactly that problem.
It's solved by the transaction system.  It works like this:
Syncing a transaction group
======================
1) Write everything except the labels
2) Write labels 1 and 3
3) send FLUSH_CACHE_EXT, which synchronously flushes the HDD's write cache
4) Write labels 2 and 4
5) send FLUSH_CACHE_EXT again

Importing a pool
=============
1) Interrogate all labels.  Use the most recent label found whose
embedded checksum is correct.

Power loss possibilities
==================
1) If power is lost during step 1 above, then on import ZFS will roll
back to the last txg, usually < 5 seconds prior
2, 3) If power is lost during step 2 or 3 and labels 1 and 3 are
corrupted, then ZFS will roll back to the last txg as stored in labels
2 and 4
4, 5) If power is lost during step 4 or 5 then labels 1 and 3 will be
correct, and ZFS will use them.

-Alan

>
> Michelle


More information about the freebsd-stable mailing list