gmirror slice insertion, "FAILURE - READ_DMA status=51<READY, DSC, ERROR>"

Carl k0802647 at telus.net
Tue Oct 28 20:41:34 PDT 2008


Jeremy Chadwick said:
>> ad6: FAILURE - READ_DMA status=51<READY,DSC,ERROR>  
>> error=40<UNCORRECTABLE> LBA=134802751
> 
> Are you sure you don't have a bad hard disk?  This looks to be like a
> classic block/sector failure.

I hadn't realized that a bad block would manifest itself with a message 
about DMA. Seems like such semantics would be a little obscure to most 
users, apparently including me.

> So you're saying that the *exact* same READ_DMA error, at the *exact*
> same LBA, is reported on ad4?  If so, that's very bizarre.

No, perhaps I wasn't clear enough. Both instances were on ad6, so far.

> Can you please provide the output from the following commands?

See end of message. Let me know if you then want more (in- or out-of-band).

Having now installed smartmontools, you can see below that I ran it for 
both ad4 and ad6. Sure enough, ad6 has logged 2 READ DMA errors - does 
that make this a definitive bad disk then?

Should I not be worried about ad4 too? Those Raw_Read_Error_Rate and 
Seek_Error_Rate numbers should be zero or very close to it, shouldn't 
they? I don't know how to interpret what I'm seeing in that output, so 
I'd appreciate any insight. Should I be returning both disks for 
warranty claims (they're both very recently purchased)?

Wojciech Puchar said:
> boot from some kind of live CD, then make another mirror (single disk now) 
> on other drive, then do
> 
> dd if=/dev/ad6s1 of=/dev/mirror/newmirror bs=2k conv=noerror,sync
> 
> i intentionally did bs=2k instead of larger, to minimize amount of lost 
> data.
> 
> then change your system to boot from newmirror, take out /dev/ad6 and have 
> it replaced on warranty (or buy new), put new ad6, insert it to the 
> mirror.

I think you're describing a method to help me save as much data from ad6 
as possible. Fortunately, this is all about constructing a new system, 
so there's no data yet to lose.

Is there anything I should know about this model of hard disk with 
regards to being known for problems? Also, is there a good test I can 
perform to hopefully flush out any problems before I put this thing into 
service?

Carl                                             / K0802647

######## Additional Information ########

# vmstat -i
interrupt                          total       rate
irq1: atkbd0                           4          0
irq4: sio0                        125724         16
irq19: uhci3                           5          0
irq21: uhci1+                     478364         63
irq23: uhci2 ehci1                     1          0
cpu0: timer                     14517071       1923
irq256: em0                       109568         14
cpu1: timer                     14514956       1922
Total                           29745693       3940

# atacontrol list | grep -v "no device present"
ATA channel 0:
ATA channel 1:
ATA channel 2:
     Master:  ad4 <ST31000340AS/SD15> Serial ATA II
ATA channel 3:
     Master:  ad6 <ST31000340AS/SD15> Serial ATA II
ATA channel 4:
     Master: acd0 <HL-DT-ST DVDRAM GH20NS10/EL00> Serial ATA v1.0
ATA channel 5:
ATA channel 6:
ATA channel 7:

# atacontrol cap ad4

Protocol              Serial ATA II
device model          ST31000340AS
serial number         xxxxxxxH
firmware revision     SD15
cylinders             16383
heads                 16
sectors/track         63
lba supported         268435455 sectors
lba48 supported       1953525168 sectors
dma supported
overlap not supported

Feature                      Support  Enable    Value           Vendor
write cache                    yes      yes
read ahead                     yes      yes
Native Command Queuing (NCQ)   yes       -      31/0x1F
Tagged Command Queuing (TCQ)   no       no      31/0x1F
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no      65278/0xFEFE
automatic acoustic management  no       no      0/0x00  254/0xFE

# atacontrol cap ad6

Protocol              Serial ATA II
device model          ST31000340AS
serial number         xxxxxxxA
firmware revision     SD15
cylinders             16383
heads                 16
sectors/track         63
lba supported         268435455 sectors
lba48 supported       1953525168 sectors
dma supported
overlap not supported

Feature                      Support  Enable    Value           Vendor
write cache                    yes      yes
read ahead                     yes      yes
Native Command Queuing (NCQ)   yes       -      31/0x1F
Tagged Command Queuing (TCQ)   no       no      31/0x1F
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no      65278/0xFEFE
automatic acoustic management  no       no      0/0x00  254/0xFE

# smartctl -a /dev/ad4
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8 
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31000340AS
Serial Number:    xxxxxxxH
Firmware Version: SD15
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Oct 28 18:07:25 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                         was completed without error.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run.
Total time to complete Offline
data collection:                 ( 650) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection 
on/off support.
                                         Suspend Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 230) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail 
Always       -       158643744
   3 Spin_Up_Time            0x0003   092   091   000    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
Always       -       108
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x000f   064   060   030    Pre-fail 
Always       -       2921473
   9 Power_On_Hours          0x0032   100   100   000    Old_age 
Always       -       499
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
Always       -       108
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always 
       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always 
       -       0
188 Unknown_Attribute       0x0032   100   099   000    Old_age   Always 
       -       65540
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always 
       -       0
190 Airflow_Temperature_Cel 0x0022   071   069   045    Old_age   Always 
       -       29 (Lifetime Min/Max 23/31)
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always 
       -       29 (0 20 0 0)
195 Hardware_ECC_Recovered  0x001a   039   019   000    Old_age   Always 
       -       158643744
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always 
       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

# smartctl -a /dev/ad6
smartctl version 5.38 [i386-portbld-freebsd7.0] Copyright (C) 2002-8 
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31000340AS
Serial Number:    xxxxxxxA
Firmware Version: SD15
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Oct 28 18:08:22 2008 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                         was completed without error.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run.
Total time to complete Offline
data collection:                 ( 642) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection 
on/off support.
                                         Suspend Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 227) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103b) SCT Status supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   116   100   006    Pre-fail 
Always       -       106947042
   3 Spin_Up_Time            0x0003   092   091   000    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
Always       -       108
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail 
Always       -       2
   7 Seek_Error_Rate         0x000f   061   060   030    Pre-fail 
Always       -       1376532
   9 Power_On_Hours          0x0032   100   100   000    Old_age 
Always       -       499
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       1
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
Always       -       108
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always 
       -       0
187 Reported_Uncorrect      0x0032   098   098   000    Old_age   Always 
       -       2
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always 
       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always 
       -       0
190 Airflow_Temperature_Cel 0x0022   071   069   045    Old_age   Always 
       -       29 (Lifetime Min/Max 23/31)
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always 
       -       29 (0 19 0 0)
195 Hardware_ECC_Recovered  0x001a   038   018   000    Old_age   Always 
       -       106947042
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always 
       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       2
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always 
       -       0

SMART Error Log Version: 1
ATA Error Count: 2
         CR = Command Register [HEX]
         FR = Features Register [HEX]
         SC = Sector Count Register [HEX]
         SN = Sector Number Register [HEX]
         CL = Cylinder Low Register [HEX]
         CH = Cylinder High Register [HEX]
         DH = Device/Head Register [HEX]
         DC = Device Command Register [HEX]
         ER = Error register [HEX]
         ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 475 hours (19 days + 19 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 9d ed 08 08  Error: UNC at LBA = 0x0808ed9d = 134802845

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 00 3f ed 08 48 00  13d+00:32:54.564  READ DMA
   c8 00 00 3f ec 08 48 00  13d+00:32:54.563  READ DMA
   c8 00 00 3f eb 08 48 00  13d+00:32:54.562  READ DMA
   c8 00 00 3f ea 08 48 00  13d+00:32:54.561  READ DMA
   c8 00 00 3f e9 08 48 00  13d+00:32:54.560  READ DMA

Error 1 occurred at disk power-on lifetime: 474 hours (19 days + 18 hours)
   When the command that caused the error occurred, the device was 
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   40 51 00 9d ed 08 08  Error: UNC at LBA = 0x0808ed9d = 134802845

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 00 3f e9 08 48 00  12d+23:04:28.359  READ DMA
   c8 00 00 3f 53 06 48 00  12d+23:04:27.202  READ DMA
   c8 00 00 3f 52 06 48 00  12d+23:04:27.193  READ DMA
   c8 00 00 3f 51 06 48 00  12d+23:04:27.191  READ DMA
   c8 00 00 3f 50 06 48 00  12d+23:04:27.191  READ DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

######## END ########


More information about the freebsd-questions mailing list