hard disk failure - now what?

Kelly Martin kellymartin at gmail.com
Wed Aug 26 05:46:51 UTC 2009


First, thanks to everyone for the really great replies. Many
suggestions were quite helpful and have kept me on track. I'll quote a
couple of people and then add some comments below.

On Mon, Aug 24, 2009 at 4:32 PM, Roland Smith<rsmith at xs4all.nl> wrote:
> It _could_ just be a bad or improperly connected SATA cable. Try changing or
> re-seating the cable.

I thought of that too, but no luck.

> Read errors cannot damage your data, but write errors can! Immediately stop
> all writing to the disk. Re-mount the partitions on that disk as read-only, or
> unmount them.

That was a consensus among everyone who replied, so I made that step
#1. I mounted the partitions read-only and crossed my fingers. Trying
to check the integrity of the data, or even get directory listings was
another matter, as I got various strange errors... which told me I
quite likely had some data loss.

> To see if a disk really is broken, install sysutils/smartmontools, and run
> 'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated
> sectors), the disk is dying and should be unplugged to prevent it from getting
> worse.

That's a good idea and I'll try to use it in the future. After
plugging the drive in and accessing it, I heard those tell-tale signs
of hard drive failure: clicks and pops and other unusual noises, so I
know that it has some damage. I hate those sounds, having heard them
on failing drives too many times before.

>
>> My question: what kind of checks and/or repair tools should I run on
>> the damaged drive after it's mounted?
>
> As others have mentioned, first make a copy (with the disk unmounted) of the
> partitions on that disk with dd, saving them to another drive. That way you
> can experiment with the data without further deterioration of the
> original.

I ran dd and it took over 20 hours to complete. In fact it just
finished this evening, after running all day. Lots of FAILURE errors
were reported along the way, enough to fill two console screens or
more. And of course to complicate things I didn't have a spare drive
as an output device that was the *same size*, so I used a smaller
drive thinking that it wouldn't matter since the source drive wasn't
full anyway. I have no idea if data is scattered around on the FFS
filesystem such that cloning a mostly empty, larger drive onto
something smaller might lose data... I searched Google and couldn't
find the answer, so I proceeded anyway. It doesn't matter now though,
as I have a new drive now and another plan.

>You can use this disk image e.g. as a vnode-backed memory disk, see
> mdconfig(8). If you cannot get a good copy of the disk partitions it might be
> a good idea to get a quote from a professional hard drive data recovery
> company to do that for you. I've never had occasion to try this (hooray for
> backups) but I've heard it can be quite expensive. :-/

I'm going to try dd a second time, but this time I'll use ddrescue as
some people suggested and I'll make the target drive an
identical-sized 500 Gbyte drive, which I purchased today. I imagine it
will take a long time to create this cloned disk... hopefully with
fewer errors than dd gave me, though we'll see.

> Try using fsck_ffs on (copies of) the disk image to see if that can restore
> the damage. If the damage is beyond repair for fsck_ffs, you have a real
> problem. Of course is you have a good disk image, your data is still
> there, but you might have to use a forensics program like sysutils/sleuthkit
> or hexdump to try and piece files together. And even then you cannot be sure
> that there is no corrupted data in the files themselves. Good luck with that. :-(

Indeed some of the partitions seem to be beyond repair. In particular
my /var partition is totally fubar'ed. When using fsck_ffs I got all
sorts of errors when trying to repair the partition, things like:

BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
So I used the -b option suggested in the man page, "fsck_ffs -y -b 160
/dev/ad0s1d" and it ran and fixed a few things, but then stopped with
the following error:

fsck_ufs: cannot alloc 4294967292 bytes for inoinfo

The worst part of all is that the /var partition would normally be
okay to lose if it didn't have my MySQL database on it - the most
important data on the server. I just about choked down a golf ball
when I discovered my /var partition was in such rough shape and I
might be forced to use real recovery tools, or hire a professional for
$$$, or be out-of-luck.

MySQL databases are normally stored in /var/db/mysql. But then I
remembered my MySQL server was actually running in a Jail environment,
and therefore it was located at /usr/jails/myjail/var/db/mysql instead
of /var/db/mysql, and therefore the jailed MySQL database was on a
totally different partition. Lucky! And I was also very lucky that I
could mount the large /usr partition in read-only mode and copy off
the most critical files I needed, starting with the database. No
errors on that part of the disk so far, at least with the few critical
files I've copied over. Whew!

Until just a few minutes ago I didn't think there'd be a happy ending.
But I've got the most critical data copied over now, the rest can
wait. I'm going to go run dd a second time (well, ddrescue) now and
then start work on the copy once it finishes, in a day or two.

One last thing...

On Tue, Aug 25, 2009 at 11:45 AM, Polytropon<freebsd at edvax.de> wrote:
>
> As it has been suggested, there are interesting tools in the
> ports collection. I'll post my "famous list" again. Among them,
> note ddrescue and dd_rescue. But base system tools such as the
> fetch program can help.
>
>
> System:
>        dd
>        fsck_ffs
>        clri
>        fsdb
>        fetch -rR <device>
>        recoverdisk (!)
>
> Ports:
>        ddrescue
>        dd_rescue
>        ffs2recov
>        magicrescue
>        testdisk
>        The Sleuth Kit:
>                fls
>                dls
>                ils
>                autopsy
>        scan_ffs
>        recoverjpeg
>        foremost
>        photorec

I just wanted to say: this is a great list.  Once the ddrescue copy is
complete, I'll start using some of the other tools and see what I can
recover.

Thanks again to everyone for the help!

kelly


More information about the freebsd-questions mailing list