ad0 errors on 6.0-RC1
Dan Langille
dan at langille.org
Wed Oct 12 19:55:26 PDT 2005
On 12 Oct 2005 at 19:58, Mike Tancsa wrote:
> At 05:48 PM 12/10/2005, Dan Langille wrote:
> >I'm seeing these errors but I do not know if it's an HDD problem
> >or an OS problem. Clues please?
>
> They look like hard errors, but I have seen similar problems with bad
> drive trays. smartmontools out of the ports will help you narrow it
> down. (eg check the output of smartctl -a /dev/ad0).
We did that yesterday. I don't know enough about the output to
judge, but it seems ok. Also posted to http://pastebin.com/391872
[root at mtwenty:/usr/ports/sysutils/smartmontools] # smartctl -a
/dev/ad0
smartctl version 5.33 [i386-portbld-freebsd6.0] Copyright (C) 2002-4
Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6Y080L0
Serial Number: Y3KLWA7E
Firmware Version: YAR41BW0
User Capacity: 81,964,302,336 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0
Local Time is: Tue Oct 11 08:45:22 2005 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection
activity
was never started.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test
routine completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 182) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline
immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection
upon new
command.
Offline surface scan
supported.
Self-test supported.
No Conveyance Self-test
supported.
Selective Self-test
supported.
SMART capabilities: (0x0003) Saves SMART data before
entering
power-saving mode.
Supports SMART auto save
timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging
support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 40) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
3 Spin_Up_Time 0x0027 200 200 063 Pre-fail
Always - 16714
4 Start_Stop_Count 0x0032 253 253 000 Old_age
Always - 77
5 Reallocated_Sector_Ct 0x0033 253 253 063 Pre-fail
Always - 0
6 Read_Channel_Margin 0x0001 253 253 100 Pre-fail
Offline - 0
7 Seek_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
8 Seek_Time_Performance 0x0027 251 247 187 Pre-fail
Always - 36405
9 Power_On_Minutes 0x0032 243 243 000 Old_age
Always - 317h+56m
10 Spin_Retry_Count 0x002b 253 252 157 Pre-fail
Always - 0
11 Calibration_Retry_Count 0x002b 253 252 223 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 253 253 000 Old_age
Always - 84
192 Power-Off_Retract_Count 0x0032 253 253 000 Old_age
Always - 0
193 Load_Cycle_Count 0x0032 253 253 000 Old_age
Always - 0
194 Temperature_Celsius 0x0032 253 253 000 Old_age
Always - 36
195 Hardware_ECC_Recovered 0x000a 253 252 000 Old_age
Always - 3036
196 Reallocated_Event_Count 0x0008 253 253 000 Old_age
Offline - 0
197 Current_Pending_Sector 0x0008 253 253 000 Old_age
Offline - 0
198 Offline_Uncorrectable 0x0008 253 253 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x0008 198 196 000 Old_age
Offline - 4
200 Multi_Zone_Error_Rate 0x000a 253 252 000 Old_age
Always - 0
201 Soft_Read_Error_Rate 0x000a 253 252 000 Old_age
Always - 4
202 TA_Increase_Count 0x000a 253 252 000 Old_age
Always - 0
203 Run_Out_Cancel 0x000b 253 252 180 Pre-fail
Always - 0
204 Shock_Count_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
205 Shock_Rate_Write_Opern 0x000a 253 252 000 Old_age
Always - 0
207 Spin_High_Current 0x002a 253 252 000 Old_age
Always - 0
208 Spin_Buzz 0x002a 253 252 000 Old_age
Always - 0
209 Offline_Seek_Performnce 0x0024 198 198 000 Old_age
Offline - 0
99 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
100 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
101 Unknown_Attribute 0x0004 253 253 000 Old_age
Offline - 0
SMART Error Log Version: 1
ATA Error Count: 4
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 4 occurred at disk power-on lifetime: 3332 hours (138 days + 20
hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 38 1f a2 e0 Error: ICRC, ABRT at LBA = 0x00a21f38 =
10624824
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 37 38 1f a2 e0 08 00:06:15.120 READ DMA
c8 00 09 2f 1f a2 e0 08 00:06:15.120 READ DMA
c8 00 36 f9 1e a2 e0 08 00:06:15.120 READ DMA
c8 00 0a ef 1e a2 e0 08 00:06:15.120 READ DMA
c8 00 35 ba 1e a2 e0 08 00:06:15.120 READ DMA
Error 3 occurred at disk power-on lifetime: 3332 hours (138 days + 20
hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 ba 1e a2 e0 Error: ICRC, ABRT at LBA = 0x00a21eba =
10624698
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 35 ba 1e a2 e0 08 00:06:15.056 READ DMA
c8 00 0b af 1e a2 e0 08 00:06:15.056 READ DMA
c8 00 34 7b 1e a2 e0 08 00:06:15.056 READ DMA
c8 00 0c 6f 1e a2 e0 08 00:06:15.056 READ DMA
c8 00 02 6f 1e a2 e0 08 00:06:15.056 READ DMA
Error 2 occurred at disk power-on lifetime: 3332 hours (138 days + 20
hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 c1 aa a2 e0 Error: ICRC, ABRT at LBA = 0x00a2aac1 =
10660545
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 3e c1 aa a2 e0 08 00:06:14.880 READ DMA
c8 00 02 bf aa a2 e0 08 00:06:14.880 READ DMA
c8 00 34 0b 3b 53 e0 08 00:06:14.880 READ DMA
c8 00 0c ff 3a 53 e0 08 00:06:14.880 READ DMA
c8 00 01 7e 00 00 e0 08 00:06:14.880 READ DMA
Error 1 occurred at disk power-on lifetime: 3332 hours (138 days + 20
hours)
When the command that caused the error occurred, the device was in
an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 79 96 0e e0 Error: ICRC, ABRT at LBA = 0x000e9679 =
956025
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 16 79 96 0e e0 08 00:06:14.736 READ DMA
c8 00 2a 4f 96 0e e0 08 00:06:14.736 READ DMA
c8 00 02 33 54 53 e0 08 00:06:14.736 READ DMA
c8 00 08 f7 aa a2 e0 08 00:06:14.736 READ DMA
c8 00 08 f7 aa a2 e0 08 00:06:14.736 READ DMA
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 3370
-
# 2 Short offline Completed without error 00% 7
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.
[root at mtwenty:/usr/ports/sysutils/smartmontools] #
>
> ---Mike
>
>
> >The following was also posted at http://pastebin.com/391670
> >
> >Oct 11 03:40:00 mtwenty kernel: ad0: FAILURE - READ_DMA
> >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
> >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA
> >,ILLEGAL_LENGTH> LBA=802719
> >Oct 11 03:40:00 mtwenty kernel:
> >g_vfs_done():ad0s1a[READ(offset=410959872, length=16384)]error = 5
> >Oct 11 03:40:06 mtwenty kernel: ad0: FAILURE - READ_DMA
> >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
> >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA
> >,ILLEGAL_LENGTH> LBA=802175
> >Oct 11 03:40:06 mtwenty kernel:
> >g_vfs_done():ad0s1a[READ(offset=410681344, length=8192)]error = 5
> >Oct 11 03:40:06 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=4857391
> >Oct 11 03:40:01 mtwenty cron[82160]: login_getclass: retrieving
> >class information: Input/output error
> >Oct 11 03:44:49 mtwenty kernel: ad0: FAILURE - READ_DMA
> >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
> >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA
> >,ILLEGAL_LENGTH> LBA=151787983
> >Oct 11 03:44:49 mtwenty kernel:
> >g_vfs_done():ad0s1f[READ(offset=74097885184, length=14336)]error = 5
> >Oct 11 03:44:56 mtwenty kernel: ad0: FAILURE - WRITE_DMA
> >status=7f<READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
> >error=7f<UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDI
> >A,ILLEGAL_LENGTH> LBA=4857391
> >Oct 11 03:44:56 mtwenty kernel:
> >g_vfs_done():ad0s1d[WRITE(offset=969719808, length=10240)]error = 5
> >Oct 11 03:44:56 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=92997387
> >Oct 11 03:55:07 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=4092687
> >Oct 11 13:04:08 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=4092687
> >Oct 11 13:52:08 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=4092687
> >Oct 11 13:55:07 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=4092687
> >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 13:55:33 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 13:55:33 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 13:55:33 mtwenty kernel:
> >g_vfs_done():ad0s1f[WRITE(offset=42777804800, length=16384)]error = 5
> >Oct 11 13:55:33 mtwenty kernel:
> >g_vfs_done():ad0s1f[WRITE(offset=43163189248, length=16384)]error = 5
> >Oct 11 13:55:33 mtwenty kernel:
> >g_vfs_done():ad0s1a[WRITE(offset=131072, length=16384)]error = 5
> >Oct 11 13:55:33 mtwenty kernel:
> >g_vfs_done():ad0s1a[WRITE(offset=147456, length=16384)]error = 5
> >Oct 11 13:55:38 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=786815
> >Oct 11 15:44:31 mtwenty shutdown: reboot by dan:
> >
> >Oct 11 16:13:03 mtwenty su: dan to root on /dev/ttyp1
> >Oct 11 19:51:04 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 19:51:09 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 19:51:09 mtwenty kernel: ad0: timeout waiting to issue command
> >Oct 11 19:51:09 mtwenty kernel: ad0: error issueing WRITE_DMA command
> >Oct 11 19:51:09 mtwenty kernel:
> >g_vfs_done():ad0s1f[WRITE(offset=49576368128, length=2048)]error = 5
> >Oct 11 19:51:09 mtwenty kernel:
> >g_vfs_done():ad0s1f[WRITE(offset=49767104512, length=16384)]error = 5
> >Oct 11 19:51:09 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=104266895
> >Oct 11 20:17:45 mtwenty kernel: ad0: TIMEOUT - WRITE_DMA retrying (1
> >retry left) LBA=319
> >Oct 12 17:23:37 mtwenty syslogd: kernel boot file is /boot/kernel/kernel
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1d[WRITE(offset=969867264, length=8192)]error = 6
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1d[WRITE(offset=963559424, length=16384)]error = 6
> >Oct 12 17:23:37 mtwenty kernel: unknown: TIMEOUT - READ_DMA retrying
> >(0 retries left) LBA=153118463
> >Oct 12 17:23:37 mtwenty kernel: unknown: FAILURE - READ_DMA timed
> >out LBA=153118463
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1f[READ(offset=74779090944, length=2048)]error = 5
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1f[READ(offset=74779097088, length=2048)]error = 6
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1f[READ(offset=74202345472, length=2048)]error = 6
> >Oct 12 17:23:37 mtwenty kernel:
> >g_vfs_done():ad0s1f[READ(offset=75589498880, length=2048)]error = 6
> >
> >Thanks
> >--
> >Dan Langille : http://www.langille.org/
> >BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
> >
> >
> >_______________________________________________
> >freebsd-current at freebsd.org mailing list
> >http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
>
>
--
Dan Langille : http://www.langille.org/
BSDCan - The Technical BSD Conference - http://www.bsdcan.org/
More information about the freebsd-current
mailing list