Help recovering damaged drive - fsck segfaults, read-only mount looks ok

Matt Emmerton matt at gsicomp.on.ca
Mon Feb 1 23:42:13 UTC 2021


On Sun, 31 Jan 2021 17:12:40 -0500, Matt Emmerton wrote:

> > If I force a mount in readonly mode, I can inspect the drive and at 
> > first glance, everything seems valid.  Since this machine is used for 
> > backups, I have lots of other medata (eg, checksums) and I'm slowly 
> > working through to see if anything important is damaged.
>
> At this point: STOP.
> If your data is important to you, get a copy of it NOW.

Yep, I've been down this road before, which is why I have backups in the
first place.  Still working through which is the "best" place to seed my new
backups from - the backup system (maybe damaged) or the source system (still
healthy).

> > From some of the stuff that fsck is finding, it's clear that the 
> > corruption is in a rather large-and-deep directory tree that was
recently deleted.
> > It's possible that the 'rm -rf' for this was running in the background 
> > when the system lost power.
>
> Therefore deleted files (or "scheduled for deletion") can still be present
in the r/o
> mount.

The thing is - I don't care about the deleted files.  What I care about is
the fact that trying to "fix" the partially deleted directory tree is
sending fsck down paths which might not be right, which has me worried.

> > Is there any way to have fsck be more "selective" in what it
checks/repairs?
> > It's been a long time since I've done low-level filesystem surgery, 
> > but it seems to me that if I can prevent it from going off into the 
> > weeds (and trying to repair inode entries that are no longer 
> > relevant), all will be well.
>
> Yes. There is a "preen mode" (fsck -p) and a forced mode (fsck -f).
> Be careful with specifying -y, it does not always to what you want it to
do. Data loss might happen.
>
> See "man fsck" for details.

Preen mode doesn't do anything useful (basically exits immediately).
Force mode sounds scary.  And I agree that fsck -y might not do the right
thing.

> > Any advice?  I have thought about doing some inspection with "ls -i" 
> > and then being very selective in the inodes I get fsck to repair, but 
> > that seems challenging to get right.
> 
> And _that_ is how I finally got my files back (the initial "severe data
> loss problem more than 10 years ago): With ls -i, I determined the
> inode of an offending directory, then used fsdb (which I found out
> about reading a reference manual about a GDR UNIX system) to
> remove it, and _then_ (!) fsck was able, after two runs, to bring
> the filesystem back to a consistent state.

Ahah!  fsdb!  That's the tool I remember poking around with a long time ago.

> All the best, and I hope you can solve that problem. It's one of the very
> few cases that can happen, and which teach you a lot about how the UFS
> filesystem works. :-)

Thank you for your guidance.  Hopefully I can get to the bottom of this --
safely :)

Matt



More information about the freebsd-questions mailing list