ZFS...

Karl Denninger karl at denninger.net
Tue Apr 30 14:44:23 UTC 2019


On 4/30/2019 08:38, Michelle Sullivan wrote:
> Karl Denninger wrote:
>> On 4/30/2019 03:09, Michelle Sullivan wrote:
>>> Consider..
>>>
>>> If one triggers such a fault on a production server, how can one
>>> justify transferring from backup multiple terabytes (or even
>>> petabytes now) of data to repair an unmountable/faulted array....
>>> because all backup solutions I know currently would take days if not
>>> weeks to restore the sort of store ZFS is touted with supporting.
>> Had it happen on a production server a few years back with ZFS.  The
>> *hardware* went insane (disk adapter) and scribbled on *all* of the
>> vdevs.
>>
>> ....
>> Time to recover essential functions was ~8 hours (and over 24
>> hours for everything to be restored.)
>>
> How big was the storage area?
>
In that specific instance approximately 10Tb in-use.  The working set
that allowed critical functions to come online (and which got restored
first, obviously) was ~3Tb.

BTW my personal "primary server" working set is approximately 20Tb. 
There is data on that server dating back to 1982 -- yes, data all the
way back to systems I owned that ran on a Z-80 processor with 64Kb (not
MB) of RAM.  I started running ZFS a fairly long time ago on FreeBSD --
9.0, I believe, and have reorganized and upgraded drives over time.  If
my storage fails "hard" in a way that I have no local backups available
(e.g. building fire, adapter scribbles on drives including not-mounted
ones, etc) critical functionality (e.g. email receipt, etc) can be back
online in roughly 3-4 hours, assuming the bank is open and I can get to
the backup volumes.  A full restore will require more than a day.  I've
tested restore of each individual piece of the backup structure but do
not have the hardware available in the building to restore a complete
clone.  With the segregated structure of it, however, I'm 100% certain
it is all restorable.  That's tested regularly -- just to be sure.

Now if we get nuked and the bank vault is also destroyed then it's over,
but then again I'm probably a burnt piece of charcoal in such a
circumstance so that's a risk I accept.

When I ran my ISP in the 1990s we had both local copies and vault copies
because a "scribbles on disk" failure on a Saturday couldn't be unable
to be addressed until Monday morning.  We would have been out of
business instantly if that had happened in any situation short of the
office with our primary data center burning down.  Incidentally one of
my adapter failures was in exactly the worst possible place for it to
occur while running that company -- the adapter on the machine that held
our primary authentication and billing database.

At the time the best option for "large" working sets was DLT.  Now for
most purposes it's another disk.  Disks, however, must be re-verified
more-frequently than DLT -- MUCH more frequently.  Further, if you have
only ONE backup then it cannot be singular (e.g. there must be two or
more, whether via mirroring or some other mechanism) on ANY media. 
While DLT, for example, has a typical expected 30 year archival life
that doesn't mean the ONE tape you have will be readable 30 years later.

As data size expands noodling on how you segregate data into read-only,
write-very-occasionally and read/write, along with how you handle
backups of each component and how, or if, you subdivide those categories
for backup purposes becomes quite important.  If performance matters
(and it usually does) then what goes where in what pool (and across
pools of similar base storage types) matters too; in my personal working
set there is both SSD (all "power-pull safe" drives, which cost more and
tend to be somewhat slower than "consumer" SSDs) and spinning rust
storage for that exact reason.

Note that on this list right now I'm chasing a potential "gotcha"
interaction between geli and ZFS that thus far has eluded isolation. 
While it has yet to corrupt data the potential is there and the hair on
the back of my neck is standing up a bit as a consequence.  It appears
to date to either 11.2 or 12.0 and *definitely* is present in 12-STABLE;
it was *not* present on 11.1.

The price of keeping your data intact is always eternal vigilance.

-- 
Karl Denninger
karl at denninger.net <mailto:karl at denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4897 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190430/a88e3945/attachment-0001.bin>


More information about the freebsd-stable mailing list