Multiple hard disk failures - coincidence ?
    Peter Jeremy 
    PeterJeremy at optushome.com.au
       
    Sat Dec 18 01:17:44 PST 2004
    
    
  
On Sat, 2004-Dec-18 02:03:09 -0500, Gary Corcoran wrote:
>I've just had *THREE* Maxtor 250GB hard disk failures on my
>FreeBSD 4.10 server within a matter of days.  One I could
>attribute to actual failure.  Two made me suspicious.  Three
>has me wondering if this is some software problem...   (or
>a conspiracy (just kidding) ;-) )
Seems unlikely that faulty server software could cause a disk failure.
One possibility is that your power supply is a but stressed and the
supply rails are out of tolerance.  The other possibility is that the
drives are overheating.  Higher density drives will be more sensitive
to both heat and dirty power.
>  I suppose it
>is possible these errors may have shown up more than a week or
>two ago, because my windows machines, reaching them via samba,
>haven't shown any problems until today, and of course with almost
>750GB of data, it's not all accessed over a short time span.
My approach to this is to add a line similar to 
  dd if=/dev/ad0 of=/dev/null bs=32k
for each disk into /etc/daily.local (or /etc/weekly.local or whatever).
This ensures that the disks are readable on a regular basis.
>P.S. I *can't* be the first person to run into this problem:
>When one gets a "hard error" reported for a certain block number,
>how does one find out exactly *which* file or directory is now
>unreadable?  With hundreds of thousands of megabytes on one disk,
>a manual search is not practical - somebody must have written a
>program to 'backtrack' a block number to a particular file name
>- no?
I know I've done this in the past but I don't recall exactly how.
About all you can do is search through the inode list for the
relevant blocks and then map the inode numbers to file names.
-- 
Peter Jeremy
    
    
More information about the freebsd-hackers
mailing list