ZFS...

Wed May 1 03:17:43 UTC 2019

Michelle Sullivan
http://www.mhix.org/
Sent from my iPad

> On 01 May 2019, at 12:37, Karl Denninger <karl at denninger.net> wrote:
> 
> On 4/30/2019 20:59, Michelle Sullivan wrote
>>> On 01 May 2019, at 11:33, Karl Denninger <karl at denninger.net> wrote:
>>> 
>>>> On 4/30/2019 19:14, Michelle Sullivan wrote:
>>>> 
>>>> Michelle Sullivan
>>>> http://www.mhix.org/
>>>> Sent from my iPad
>>>> 
>>> Nope.  I'd much rather *know* the data is corrupt and be forced to
>>> restore from backups than to have SILENT corruption occur and perhaps
>>> screw me 10 years down the road when the odds are my backups have
>>> long-since been recycled.
>> Ahh yes the be all and end all of ZFS.. stops the silent corruption of data.. but don’t install it on anything unless it’s server grade with backups and ECC RAM, but it’s good on laptops because it protects you from silent corruption of your data when 10 years later the backups have long-since been recycled...  umm is that not a circular argument?
>> 
>> Don’t get me wrong here.. and I know you (and some others are) zfs in the DC with 10s of thousands in redundant servers and/or backups to keep your critical data corruption free = good thing.
>> 
>> ZFS on everything is what some say (because it prevents silent corruption) but then you have default policies to install it everywhere .. including hardware not equipped to function safely with it (in your own arguments) and yet it’s still good because it will still prevent silent corruption even though it relies on hardware that you can trust...  umm say what?
>> 
>> Anyhow veered way way off (the original) topic...
>> 
>> Modest (part consumer grade, part commercial) suffered irreversible data loss because of a (very unusual, but not impossible) double power outage.. and no tools to recover the data (or part data) unless you have some form of backup because the file system deems the corruption to be too dangerous to let you access any of it (even the known good bits) ...  
>> 
>> Michelle
> 
> IMHO you're dead wrong Michelle.  I respect your opinion but disagree
> vehemently.

I guess we’ll have to agree to disagree then, but I think your attitude to pronounce me “dead wrong” is short sighted, because it strikes of “I’m right because ZFS is the answer to all problems.” .. I’ve been around in the industry long enough to see a variety of issues... some disasters, some not so...

I also should know better than to run without backups but financial constraints precluded me.... as will for many non commercial people.

> 
> I run ZFS on both of my laptops under FreeBSD.  Both have
> non-power-protected SSDs in them.  Neither is mirrored or Raidz-anything.
> 
> So why run ZFS instead of UFS?
> 
> Because a scrub will detect data corruption that UFS cannot detect *at all.*

I get it, I really do, but that balances out against, if you can’t rebuild it make sure you have (tested and working) backups and be prepared for downtime when such corruption does occur.

> 
> It is a balance-of-harms test and you choose.  I can make a very clean
> argument that *greater information always wins*; that is, I prefer in
> every case to *know* I'm screwed rather than not.  I can defend against
> being screwed with some amount of diligence but in order for that
> diligence to be reasonable I have to know about the screwing in a
> reasonable amount of time after it happens.

Not disagreeing (and have not been.)

> 
> You may have never had silent corruption bite you.

I have... but not with data on disks..  most of my silent corruption issues have been with a layer or two above the hardware... like subversion commits overwriting previous commits without notification (damn I wish I could reliably replicate it!)

>   I have had it happen
> several times over my IT career.  If that happens to you the odds are
> that it's absolutely unrecoverable and whatever gets corrupted is
> *gone.*

Every drive corruption I have suffered in my career I have been able to recover, all or partial data except where the hardware itself was totally hosed (Ie clean room options only available)... even with brtfs.. yuk.. puck.. yuk.. oh what a mess that was...  still get nightmares on that one...  but I still managed to get most of the data off... in fact I put it onto this machine I currently have problems with.. so after the nightmare of brtfs looks like zfs eventually nailed me.

>   The defensive measures against silent corruption require
> retention of backup data *literally forever* for the entire useful life
> of the information because from the point of corruption forward *the
> backups are typically going to be complete and correct copies of the
> corrupt data and thus equally worthless to what's on the disk itself.* 
> With non-ZFS filesystems quite a lot of thought and care has to go into
> defending against that, and said defense usually requires the active
> cooperation of whatever software wrote said file in the first place

Say what?  

> (e.g. a database, etc.)

So dbs (any?) talk actively to the file systems (any?) to actively prevent silent corruption?

Lol...

I’m guessing you are actually talking about internal checks and balances of data in the DB to ensure that dat retrieved from disk is not corrupt/altered...  you know like writing sha256 checksums of files you might download from the internet to ensure you got what you asked for and it wasn’t changed/altered in transit.

>   If said software has no tools to "walk" said
> data or if it's impractical to have it do so you're at severe risk of
> being hosed.

Umm what?  I’m talking about a userland (libzfs) tool (Ie doesn’t need the pool imported) such as zfs send (which requires the pool to be imported - hence me not calling it a userland tool) to allow a sending of data that can be found to other places where it can be either blindly recovered (corruption might be present) or can be used to locate files/paths etc that are known to be good (checksums match etc).. walk the structures, feed the data elsewhere where it can be examined/recovered... don’t alter it.... it’s a last resort tool when you don’t have working backups..

>   Prior to ZFS there really wasn't any comprehensive defense
> against this sort of event.  There are a whole host of applications that
> manipulate data that are absolutely reliant on that sort of thing not
> happening (e.g. anything using a btree data structure) and recovery if
> it *does* happen is a five-alarm nightmare if it's possible at all.  In
> the worst-case scenario you don't detect the corruption and the data
> that has the pointer to it that gets corrupted is overwritten and 
> destroyed.
> 
> A ZFS scrub on a volume that has no redundancy cannot *fix* that
> corruption but it can and will detect it.

So you’re advocating restore from backup for every corruption ... ok...

>   This puts a boundary on the
> backups that I must keep in order to *not* have that happen.  This is of
> very high value to me and is why, even on systems without ECC memory and
> without redundant disks, provided there is enough RAM to make it
> reasonable (e.g. not on embedded systems I do development on with are
> severely RAM-constrained) I run ZFS.
> 
> BTW if you've never had a UFS volume unlink all the blocks within a file
> on an fsck and then recover them back into the free list after a crash
> you're a rare bird indeed.  If you think a corrupt ZFS volume is fun try
> to get your data back from said file after that happens.

Been there done that though with ext2 rather than UFS..  still got all my data back... even though it was a nightmare..

> 
> -- 
> Karl Denninger
> karl at denninger.net <mailto:karl at denninger.net>
> /The Market Ticker/
> /[S/MIME encrypted email preferred]/