hard disk failure - now what?

Roland Smith rsmith at xs4all.nl
Mon Aug 24 22:32:49 UTC 2009


On Mon, Aug 24, 2009 at 12:29:19PM -0600, Kelly Martin wrote:
> I just experienced a hard drive failure on one of my FreeBSD 7.2
> production servers with no backup! I am so mad at myself for not
> backing up!!

Welcome to the club. :-)

> Now it's a salvage operation. Here are the type of errors
> I was getting on the console, over-and-over:
> 
> ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=441633503
> ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
> completing request directly
> ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
> completing request directly
> ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
> ad4: FAILURE - WRITE_DMA48 timed out LBA=441633375
> g_vgs_done():ad4s1f[WRITE(offset=216338284544, length=16384)]error = 5

It _could_ just be a bad or improperly connected SATA cable. Try changing or
re-seating the cable.

Read errors cannot damage your data, but write errors can! Immediately stop
all writing to the disk. Re-mount the partitions on that disk as read-only, or
unmount them.

To see if a disk really is broken, install sysutils/smartmontools, and run
'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated
sectors), the disk is dying and should be unplugged to prevent it from getting
worse.

> My question: what kind of checks and/or repair tools should I run on
> the damaged drive after it's mounted?

As others have mentioned, first make a copy (with the disk unmounted) of the
partitions on that disk with dd, saving them to another drive. That way you
can experiment with the data without further deterioration of the
original. You can use this disk image e.g. as a vnode-backed memory disk, see
mdconfig(8). If you cannot get a good copy of the disk partitions it might be
a good idea to get a quote from a professional hard drive data recovery
company to do that for you. I've never had occasion to try this (hooray for
backups) but I've heard it can be quite expensive. :-/

Try using fsck_ffs on (copies of) the disk image to see if that can restore
the damage. If the damage is beyond repair for fsck_ffs, you have a real
problem. Of course is you have a good disk image, your data is still
there, but you might have to use a forensics program like sysutils/sleuthkit
or hexdump to try and piece files together. And even then you cannot be sure
that there is no corrupted data in the files themselves. Good luck with that. :-(


Roland
-- 
R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20090824/07720cd0/attachment.pgp


More information about the freebsd-questions mailing list