gmirror slice insertion, "FAILURE - READ_DMA status=51<READY, DSC, ERROR>"

Wed Oct 29 02:00:23 PDT 2008

Jeremy Chadwick wrote:
> Seagate chooses to encode some raw data for some SMART attributes in a
> custom format.  The format is not publicly documented.  This is why you
> have to go off of the adjusted values shown in VALUE/WORST/THRESH.
> "How am I supposed to know all of this?!"  You aren't -- it comes with
> experience.

And yet my failing drive's VALUE numbers are still all above their 
THRESH values, despite it being bad enough to cripple the system. One 
might argue those threshold values leave something to be desired.

>> Is there anything I should know about this model of hard disk with  
>> regards to being known for problems? Also, is there a good test I can  
>> perform to hopefully flush out any problems before I put this thing into  
>> service?
> 
> I'm confused: what gives you the impression there's a problem with
> *this model* of hard disk?  I've seen no evidence presented that
> indicates such.  What makes you ask that question?

I don't have such an impression, thus far. In fact, Seagate drives have 
always been good to me prior to this. It's only a precautionary question 
because it's better to ask now than after I've committed a lot of real 
data and time to it and put it all into service.

> Let's take a look at the SMART data.
> 
>> # smartctl -a /dev/ad4
>>
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED      WHEN_FAILED   RAW_VALUE
...
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age  Offline        -       0
...
> 
> To get an update on Attribute 198, you'd need to run a short offline
> test ("smartctl -t short /dev/ad4").  You can safely do this while
> the disk is in use; don't let the word "offline" make you think the
> disk disappears.  You can watch the status using smartctl -a, and
> once its finished, you can compare the old value to the new.  I'm
> willing to bet it remains zero.

I ran that test on both drives. ad6 failed immediately at 90% with a 
"read failure" - not surprising. ad4 completed without error and no 
change in it's values, just as you predicted.

>> # smartctl -a /dev/ad6
>>
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED      WHEN_FAILED   RAW_VALUE
...
>>   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail Always         -       2
...
>>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail Always         -       1
...
>> 187 Reported_Uncorrect      0x0032   098   098   000    Old_age  Always         -       2
...
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age  Always         -       2
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age  Offline        -       2
...
 >
> And here we see the core of the problem.  :-)

> Advice is simple: replace this hard disk.

> Hope this helps.

It definitely did, Jeremy. Your explanations were most helpful. Thanks!

Carl                                             / K0802647