ZFS regimen: scrub, scrub, scrub and scrub again.

Mon Jan 21 11:12:56 UTC 2013

> Please don't misinterpret this post: ZFS's ability to recover from fairly
> catastrophic failures is pretty stellar, but I'm wondering if there can be

from my testing it is exactly opposite. You have to see a difference 
between marketing and reality.

> a little room for improvement.
>
> I use RAID pretty much everywhere.  I don't like to loose data and disks
> are cheap.  I have a fair amount of experience with all flavors ... and ZFS

just like me. And because i want performance and - as you described - 
disks are cheap - i use RAID-1 (gmirror).

> has become a go-to filesystem for most of my applications.

My applications doesn't tolerate low performance, overcomplexity and 
high risk of data loss.

That's why i use properly tuned UFS, gmirror, and prefer not to use 
gstripe but have multiple filesystems

> One of the best recommendations I can give for ZFS is it's
> crash-recoverability.

Which is marketing, not truth. If you want bullet-proof recoverability, 
UFS beats everything i've ever seen.

If you want FAST crash recovery, use softupdates+journal, available in 
FreeBSD 9.

>  As a counter example, if you have most hardware RAID
> going or a software whole-disk raid, after a crash it will generally
> declare one disk as good and the other disk as "to be repaired" ... after
> which a full surface scan of the affected disks --- reading one and writing
> the other --- ensues.

true. gmirror do it, but you can defer mirror rebuild, which i use.
I have a script that send me a mail when gmirror is degraded, and i - 
after finding out the cause of problem, and possibly replacing disk - run 
rebuild after work hours, so no slowdown is experienced.

> ZFS is smart on this point: it will recover on reboot with a minimum amount
> of fuss.  Even if you dislodge a drive ... so that it's missing the last
> 'n' transactions, ZFS seems to figure this out (which I thought was extra
> cudos).

Yes this is marketing. practice is somehow different. as you discovered 
yourself.

>
> MY PROBLEM comes from problems that scrub can fix.
>
> Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
> a RAID-Z configuration (2 sets, obviously).

While RAID-Z is already a king of bad performance, i assume 
you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would 
spread load unevenly and make performance even worse.

>
> A full scrub of my drives weighs in at 36 hours or so.

which is funny as ZFS is marketed as doing this efficient (like checking 
only used space).

dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. 
and you may do all in parallel.

>        vr2/cvs:<0x1c1>
>
> Now ... this is just an example: after each scrub, the hex number was

seems like scrub simply not do it's work right.

> before the old error was cleared.  Then this new error gets similarly
> cleared by the next scrub.  It seems that if the scrub returned to this new
> found error after fixing the "known" errors, this could save whole new
> scrub runs from being required.

Even better - use UFS.
For both bullet proof recoverability and performance.
If you need help in tuning you may ask me privately.