How to recover data from dead hard drive.

Fri Oct 20 08:41:13 UTC 2017

On 19 October 2017 16:02:01 BST, Valeri Galtsev <galtsev at kicp.uchicago.edu> wrote:

>>>Personally, I fail to understand why anyone with any "mission
>critical"
>>>system would not be using some form of RAID. It doesn't make any
>sense
>>>to me.
>>>Even my Laptop is configured to automatically back up data to a cloud
>>>service.
>>>Even if the drive went south, I could restore all of my data.
>>
>> I can explain why people aren't using RAID... IME It's because they
>think
>> they are. But they do it wrong, and only find out when things go
>wrong.
>>
>> Most if the disasters I deal with involve "hardware" RAID cards. I
>won't
>> single out PERC or MegaRAID because that wouldn't be fair.
>
>Hm... My mileage is different. I use hardware RAIDs a lot. With great
>success, and not a single disaster happened to me. Statistics for my
>case
>is: between a dozen and two dozens of hardware RAIDs during at least
>decade and a half. Some that are still are in production are over 10
>years
>old. My favorite 3ware, alas, was eradicated by competitors, second
>favorite is Areca, next will be LSI, and it is not a most favorite as
>it
>has horrible (confusing!) command client interface.
>
>Sometimes people come from different places and tell "hardware RAID
>horror" stories. After detailed review, all of them boil down to either
>or
>all of:
>
>1. RAID was not set up correctly. Namely: there were no surface scan
>(scrub, or similar) scheduled to happen. Monthly would be enough, I
>usually schedule it weekly. I will not go into detail how it leads to
>problem, it's been described many times;
>
>2. notification to sysadmin about failed drive, lost redundancy of RAID
>is
>not arranged (which is as well incorrectly configured RAID)
>
>3. inappropriate drives are used. The worst for RAID are "green" drives
>that spin down to conserve power. While they spin up when request from
>RAID card comes, they just time out...
>
>4. Enabling cache, while not having battery backup that keeps cache RAM
>with all its data in case of power outage
>

Hi Valeri,

My rant wasn't referring to people like you who know what they're doing. I know the type I was referring to exists because I get called in to try to recover the mess, and the problem is very often that they believe that just because they spent a lot of RAID hardware, they are indestructible. It's not a substitute for an administrator with brains!

I put "hardware RAID" in quotes, because it's all really software. As you point out, the difference is where the software is run.

Fifteen years ago ZFS wasn't an option, so the choice was moot.

The environment has also changed a lot. 15 years ago it was still reasonable to keep tape backups, and if you didn't keep some form of offline backup you were a fool. I struggle to justify a tape backup now. But if you keep your data on-line on one array you're still a fool. Replication on another array (on another site) seems to be the way forward.

I know exactly what you mean about early OS software RAID. I'd have done the same as you. Now, ZFS is robust and has a lot if advantages.

 Especially with a striped RAID5, it's amazingly common for a second disk to fail (or be found defective) shortly after the first. Their setups don't allow for it to be taken offline at the first sign of trouble, so they swap the failed drive and this thrashes the hell out of those remaining as it tries to rebuild fast.

I liked your list of common mistakes, and you're quite correct.

I don't get many people calling up to say their hardware RAID is working fine; only people saying they're broken. Our mileage is probably the same; I was ranting about the people who DO loose critical data through bad practice.

Another example I didn't mention was a small company with a Windoze server running a three-way mirror. What could possibly go wrong? Three identical copies of a trashed NTFS root directory, of course...

Regards, Frank

-- 
Sent from my Cray X/MP with small fiddling keyboard.