ZFS...

Walter Parker walterp at gmail.com
Wed May 8 15:55:53 UTC 2019


>
>
> ZDB (unless I'm misreading it) is able to find all 34m+ files and
> verifies the checksums.  The problem is in the zfs data structures (one
> definitely, two maybe, metaslabs fail checksums preventing the mounting
> (even read-only) of the volumes.)
>
> >   Especially, how to you know
> > before you recovered the data from the drive.
> See above.
>
> > As ZFS meta data is stored
> > redundantly on the drive and never in an inconsistent form (that is what
> > fsck does, it fixes the inconsistent data that most other filesystems
> store
> > when they crash/have disk issues).
> The problem - unless I'm reading zdb incorrectly - is limited to the
> structure rather than the data.  This fits with the fact the drive was
> isolated from user changes when the drive was being resilvered so the
> data itself was not being altered .. that said, I am no expert so I
> could easily be completely wrong.
>
>  What it sounds like you need is a meta data fixer, not a file recovery
tool. Assuming the meta data can be fixed that would be the easy route.
That sound not be hard to write if everything else on the disk has no
issues. Don't you say in another message that the system is now returning
100's of drive errors. How does that relate the statement =>Everything on
the disk is fine except for a little bit of corruption in the freespace map?


>
> >
> > I have a friend/business partner that doesn't want to move to ZFS because
> > his recovery method is wait for a single drive (no-redundancy, sometimes
> no
> > backup) to fail and then use ddrescue to image the broken drive to a new
> > drive (ignoring any file corruption because you can't really tell without
> > ZFS). He's been using disk rescue programs for so long that he will not
> > move to ZFS, because it doesn't have a disk rescue program.
>
> The first part is rather cavilier .. the second part I kinda
> understand... its why I'm now looking at alternatives ... particularly
> being bitten as badly as I have with an unmountable volume.
>
> On the system I managed for him, we had a system with ZFS crap out. I
restored it from a backup. I continue to believe that people running
systems without backups are living on borrowed time. The idea of relying on
a disk recovery tool is too risky for my taste.


> >   He has systems
> > on Linux with ext3 and no mirroring or backups. I've asked about moving
> > them to a mirrored ZFS system and he has told me that the customer
> doesn't
> > want to pay for a second drive (but will pay for hours of his time to fix
> > the problem when it happens). You kind of sound like him.
> Yeah..no!  I'd be having that on a second (mirrored) drive... like most
> of my production servers.
>
> > ZFS is risky
> > because there isn't a good drive rescue program.
> ZFS is good for some applications.  ZFS is good to prevent cosmic ray
> issues.  ZFS is not good when things go wrong.  ZFS doesn't usually go
> wrong.  Think that about sums it up.
>
> When it does go wrong I restore from backups. Therefore my systems don't
have problems. I sorry you had the perfect trifecta that caused you to lose
multiple drives and all your backups at the same time.


> >   Sun's design was that the
> > system should be redundant by default and checksum everything. If the
> > drives fail, replace them. If they fail too much or too fast, restore
> from
> > backup. Once the system had too much corruption, you can't recover/check
> > for all the damage without a second off disk copy. If you have that off
> > disk, then you have backup. They didn't build for the standard use case
> as
> > found in PCs because the disk recover programs rarely get everything
> back,
> > therefore they can't be relied on to get you data back when your data is
> > important. Many PC owners have brought PC mindset ideas to the "UNIX"
> > world. Sun's history predates Windows and Mac and comes from a
> > Mini/Mainframe mindset (were people tried not to guess about data
> > integrity).
> I came from the days of Sun.
>
> Good then you should understand Sun's point of view.


> >
> > Would a disk rescue program for ZFS be a good idea? Sure. Should the lack
> > of a disk recovery program stop you from using ZFS? No. If you think so,
> I
> > suggest that you have your data integrity priorities in the wrong order
> > (focusing on small, rare events rather than the common base case).
> Common case in your assessment in the email would suggest backups are
> not needed unless you have a rare event of a multi-drive failure.  Which
> I know you're not advocating, but it is this same circular argument...
> ZFS is so good it's never wrong we don't need no stinking recovery
> tools, oh but take backups if it does fail, but it won't because it's so
> good and you have to be running consumer hardware or doing something
> wrong or be very unlucky with failures... etc.. round and round we go,
> where ever she'll stop no-one knows.
>
> I advocate 2-3 backups of any important system (at least one different
that the other, offsite if one can afford it).
I never said ZFS is so good we don't need backups (that would be a stupid
comment). As far as a recovery tool, those sound risky. I'd prefer
something without so much risk.

Make your own judgement, it is your time and data. I think ZFS is a great
filesystem that anyone using FreeBSD or Illumios should be using.


-- 
The greatest dangers to liberty lurk in insidious encroachment by men of
zeal, well-meaning but without understanding.   -- Justice Louis D. Brandeis


More information about the freebsd-stable mailing list