WRITE_DMA errors on SATA drive under 5.3-RELEASE

Anthony Atkielski atkielski.anthony at wanadoo.fr
Sun Feb 27 22:09:53 GMT 2005


Mike Tancsa writes:

> Could be a bad sector on the drive, or bad cable. Hard to say.  Try
> /usr/ports/sysutils/smartmontools/
>
> It can read all sorts of info off the drive and help you narrow down
> what the problem might be.

Wow!  That is a very cool tool.  There's even a Windows port so I can
use it on my XP machine.

The two SATA drives show no errors.  The older IDE drive (which contains
the filesystem root) shows the stuff below.  There have been over 1000
read errors over the lifetime of the disk, but the disk had some hard
times back in December when it was in my overheated old server, so that
might account for part of that.  The most recent errors look like they
might correlate with what I saw today (unfortunately, I'm not sure how
to interpret them):

======================================================================
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG SV4002H
Serial Number:    0413J1FR932555
Firmware Version: QP100-07
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 1
Local Time is:    Sun Feb 27 22:52:54 2005 CET

==> WARNING: May need -F samsung or -F samsung2 enabled; see manual for details.

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The SMART RETURN STATUS return value (smartmontools -H option/Directive)
 can not be retrieved with this version of ATAng, please do not rely on this value
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                 (1560) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   8) minutes.

SMART Attributes Data Structure revision number: 9
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       1050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       55
  5 Reallocated_Sector_Ct   0x0033   253   253   009    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0024   253   253   000    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       2968364
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       54
194 Temperature_Celsius     0x0022   175   145   000    Old_age   Always       -       21
197 Current_Pending_Sector  0x0033   253   253   009    Pre-fail  Always       -       0
198 Offline_Uncorrectable   0x0031   253   253   009    Pre-fail  Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000b   100   100   051    Pre-fail  Always       -       0
201 Soft_Read_Error_Rate    0x000b   100   100   051    Pre-fail  Always       -       1

SMART Error Log Version: 1
Warning: ATA error count 22 inconsistent with error log pointer 4

ATA Error Count: 22 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 22 occurred at disk power-on lifetime: 23324 hours (971 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 88 05 01 00 00 a0  

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 05 01 00 00 a0 00  49d+16:22:20.296  IDENTIFY PACKET DEVICE
  ec 00 05 01 00 00 b0 00  49d+16:22:20.296  IDENTIFY DEVICE
  a1 00 05 01 00 00 b0 00  49d+16:22:20.296  IDENTIFY PACKET DEVICE
  c4 00 19 7f 01 06 e0 ff  49d+16:22:06.296  READ MULTIPLE
  c4 00 01 40 00 00 e0 00  49d+16:20:45.296  READ MULTIPLE

Error 21 occurred at disk power-on lifetime: 23324 hours (971 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 88 05 01 00 00 a0  

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 05 01 00 00 a0 00  49d+16:20:17.296  IDENTIFY PACKET DEVICE
  ec 00 05 01 00 00 b0 00  49d+16:20:17.296  IDENTIFY DEVICE
  a1 00 05 01 00 00 b0 00  49d+16:20:17.296  IDENTIFY PACKET DEVICE
  ca 00 0c 5f 61 38 e0 ff  49d+16:20:04.296  WRITE DMA
  e7 00 00 00 00 00 e0 00  49d+16:19:33.296  FLUSH CACHE

Error 20 occurred at disk power-on lifetime: 23283 hours (970 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 88 05 01 00 00 a0  

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 05 01 00 00 a0 00  49d+09:02:47.296  IDENTIFY PACKET DEVICE
  ec 00 05 01 00 00 b0 00  49d+09:02:47.296  IDENTIFY DEVICE
  a1 00 05 01 00 00 b0 00  49d+09:02:47.296  IDENTIFY PACKET DEVICE
  c4 00 1a ff cd 06 e0 ff  49d+09:02:34.296  READ MULTIPLE
  c4 00 20 df cd 06 e0 ff      07:57:42.000  READ MULTIPLE

Error 19 occurred at disk power-on lifetime: 23281 hours (970 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 88 05 01 00 00 a0  

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 05 01 00 00 a0 00      07:50:43.000  IDENTIFY PACKET DEVICE
  ec 00 05 01 00 00 b0 00      07:50:43.000  IDENTIFY DEVICE
  a1 00 05 01 00 00 b0 00      07:50:43.000  IDENTIFY PACKET DEVICE
  c4 00 07 98 01 06 e0 ff      07:50:43.000  READ MULTIPLE
  e3 00 00 40 00 00 a0 00      07:50:43.000  IDLE

Error 18 occurred at disk power-on lifetime: 23272 hours (969 days + 16 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 88 05 01 00 00 a0  

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d5 01 01 4f c2 e0 00      05:59:56.000  SMART READ LOG
  b0 d1 01 01 4f c2 e0 00      05:59:56.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 d0 00 00 4f c2 e0 00      05:59:56.000  SMART READ DATA
  b0 da 00 00 4f c2 e0 00      05:59:56.000  SMART RETURN STATUS
  b0 da 00 00 4f c2 e0 00      05:59:56.000  SMART RETURN STATUS

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging

-- 
Anthony




More information about the freebsd-questions mailing list