Ominous smartd messages ....

Wed Aug 3 16:16:57 UTC 2016

On 03/08/2016 15:09, Jon Radel wrote:
> On 8/3/16 10:00 AM, Brandon J. Wandersee wrote:
>>
>> Jon Radel writes:
>>
>>> I've read reasonable sounding commentary from people running very, very
>>> large collections of hard drives that there is a high enough correlation
>>> between this error and the drive going to heck sooner rather than later
>>> that they take this as a sign to replace.  [Can't find reference right now.]
>>
>> While there's no way to know from the error message alone just what will
>> happen to the disk in the coming days, the general reasoning is this:
>> sectors are not physically segregated. They all sit on the same
>> platter. Several bad sectors occuring in a short period might be a sign
>> of physical fault in the platter, and if that fault is real then stress
>> from the platter spinning will likely cause that fault to spread. So
>> some people conclude that the appearance of several bad sectors in a
>> short period should just be a signal to replace the disk immediately.
>>
> 
> If I remember the discussion well enough (sad that I can't find it) my
> use of "correlation" was precise.  They actually manage enough drives
> (thousands) and kept enough records to allow for statistical analysis
> which indicate that this smartd error correlates very well with failure
> within [I wish I could remember] timeframe.
> 
> Do please excuse the utter lack of footnotes.  :-(
> 

I think everyone is probably thinking of Backblaze. This is their latest
summary of drive statistics

https://www.backblaze.com/blog/hard-drive-failure-rates-q2-2016/

And this is their take on which SMART metrics matter

https://www.backblaze.com/blog/hard-drive-smart-stats/

-- 
Moore's Law of Mad Science: Every eighteen months, the minimum IQ
necessary to destroy the world drops by one point.