ZFS regimen: scrub, scrub, scrub and scrub again.

Sun Jan 20 22:26:57 UTC 2013

Please don't misinterpret this post: ZFS's ability to recover from fairly
catastrophic failures is pretty stellar, but I'm wondering if there can be
a little room for improvement.

I use RAID pretty much everywhere.  I don't like to loose data and disks
are cheap.  I have a fair amount of experience with all flavors ... and ZFS
has become a go-to filesystem for most of my applications.

One of the best recommendations I can give for ZFS is it's
crash-recoverability.  As a counter example, if you have most hardware RAID
going or a software whole-disk raid, after a crash it will generally
declare one disk as good and the other disk as "to be repaired" ... after
which a full surface scan of the affected disks --- reading one and writing
the other --- ensues.  On my Windows desktop, the pair of 2T's take 3 or 4
hours to do this.  A pair of green 2T's can take over 6.  You don't loose
any data, but you have severely reduced performance until it's repaired.

The rub is that you know only one or two blocks could possibly even be
different ... and that this is a highly unoptimized way of going about the
problem.

ZFS is smart on this point: it will recover on reboot with a minimum amount
of fuss.  Even if you dislodge a drive ... so that it's missing the last
'n' transactions, ZFS seems to figure this out (which I thought was extra
cudos).

MY PROBLEM comes from problems that scrub can fix.

Let's talk, in specific, about my home array.  It has 9x 1.5T and 8x 2T in
a RAID-Z configuration (2 sets, obviously).  The drives themselves are
housed (4 each) in external drive bays with a single SATA connection for
each.  I think I have spoken of this here before.

A full scrub of my drives weighs in at 36 hours or so.

Now around Christmas, while moving some things, I managed to pull the plug
on one cabinet of 4 drives.  It was likely that the only active use of the
filesystem was an automated cvs checkin (backup) given that the errors only
appeared on the cvs directory.

IN-THE-END, no data was lost, but I had to scrub 4 times to remove the
complaints, which showed like this from "zpool status -v"

errors: Permanent errors have been detected in the following files:

        vr2/cvs:<0x1c1>

Now ... this is just an example: after each scrub, the hex number was
different.  I also couldn't actually find the error on the cvs filesystem,
as a side note.  Not many files are stored there, and they all seemed to be
present.

MY TAKEAWAY from this is that 2 major improvements could be made to ZFS:

1) a pause for scrub... such that long scrubs could be paused during
working hours.

2) going back over errors... during each scrub, the "new" error was found
before the old error was cleared.  Then this new error gets similarly
cleared by the next scrub.  It seems that if the scrub returned to this new
found error after fixing the "known" errors, this could save whole new
scrub runs from being required.