TIMEOUT - WRITE_DMA and smart questions
Eduard Martinescu
martines at rochester.rr.com
Mon Oct 11 05:19:15 PDT 2004
Ion-Mihai,
For more information on smartmontools (smartctl,smartd), check out the
Source Forge site, http://smartmontools.sourceforge.net
If you have specific questions, you can email the support list (link on
the page above).
Ed
On Mon, 2004-10-11 at 07:09, Ion-Mihai Tetcu wrote:
> [ please reply only on questions@ if this is not appropriate for current@ ]
>
> Hi,
>
> While doing nothing special the system start printing TIMEOUT -
> WRITE_DMA erros and eventually after an atacontrol mode 0 PIO4 PIO4
> hanged completely at 04:20.
>
> After restart I've got a few TIMEOUT .. but no hung, however the machine
> is idle.
>
> SMART was enabled as seen bellow, but smartd wasn't running (stupid, huh
> :-/ ).
>
> Obvious question: is the hdd dying ?
>
> Second question, as I'm not familiar with SMART: how much can one trust
> SMART reports ?
>
> Third question: could you suggest some settings for smartd ? I'm, asking
> this because I don't fully understand the man pages for smartctl and
> smartd; a link explaining more about smart would also be appreciated.
>
>
> System details:
>
> Local system status (last daily mail):
> 3:01AM up 2 days, 11:56, 2 users, load averages: 1.04, 1.07, 0.95
>
> % uname -a
> FreeBSD it.buh.cameradicommercio.ro 5.3-BETA7 FreeBSD 5.3-BETA7 #3: Mon Oct 4 21:57:25 EEST 2004 root at it.buh.tecnik93.com:/usr/obj/usr/src/sys/IT53_d i386
>
> Oct 11 04:06:51 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210020
> Oct 11 04:07:02 it kernel: ata0: reiniting channel ..
> Oct 11 04:07:02 it kernel: ata0: reset tp1 mask=03 ostat0=d0 ostat1=d0
> Oct 11 04:07:02 it kernel: ad0: stat=0xd0 err=0xd0 lsb=0xd0 msb=0xd0
> Oct 11 04:07:02 it last message repeated 95 times
> Oct 11 04:07:02 it kernel: ad0: stat=0x50 err=0x01 lsb=0x00 msb=0x00
> Oct 11 04:07:02 it kernel: ata0-slave: stat=0x00 err=0x01 lsb=0x00 msb=0x00
> Oct 11 04:07:02 it kernel: ata0: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
> Oct 11 04:07:02 it kernel: ata0: resetting done ..
> Oct 11 04:07:02 it kernel: ad0: pio=0x0c wdma=0x22 udma=0x45 cable=80pin
> Oct 11 04:07:02 it kernel: ad0: setting PIO4 on VIA 8235 chip
> Oct 11 04:07:02 it kernel: ad0: setting UDMA100 on VIA 8235 chip
> Oct 11 04:07:02 it kernel: ata0: device config done ..
> Oct 11 04:07:16 it kernel: (probe0:ata0:0:0:0): error 22
> Oct 11 04:07:16 it kernel: (probe0:ata0:0:0:0): Unretryable Error
> Oct 11 04:07:16 it kernel: (probe1:ata0:0:1:0): error 22
> Oct 11 04:07:16 it kernel: (probe1:ata0:0:1:0): Unretryable Error
> .........
>
> # grep LBA /var/log/messages
> Oct 11 04:06:51 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210020
> Oct 11 04:07:52 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165839908
> Oct 11 04:08:48 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165849220
> Oct 11 04:09:12 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165851556
> Oct 11 04:09:32 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=165859748
> Oct 11 04:10:44 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6343103
> Oct 11 04:11:23 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186210916
> Oct 11 04:11:36 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=186211044
> Oct 11 04:11:58 it kernel: acd0: FAILURE - ATA_IDENTIFY status=51<READY,DSC,ERROR> error=4<ABORTED> LBA=0
> Oct 11 04:13:21 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=309294340
> Oct 11 04:14:00 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421156
> Oct 11 04:14:24 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=175421156
> Oct 11 04:15:04 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421796
> Oct 11 04:15:48 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=130261540
> Oct 11 04:16:10 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=175421892
> Oct 11 04:16:53 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=173918724
> Oct 11 04:18:50 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=309924420
> Oct 11 04:19:14 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4920283
> Oct 11 04:40:00 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=4918975
> Oct 11 04:40:56 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6067199
> Oct 11 10:46:52 it kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=6343103
>
> # grep sw /var/log/messages
> Oct 11 04:14:24 it kernel: swap_pager: indefinite wait buffer: device: ad0s1e, blkno: 14841, size: 4096
> Oct 11 04:14:24 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 14381, size: 4096
> Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 60732, size: 4096
> Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 33481, size: 4096
> Oct 11 04:16:53 it kernel: swap_pager: indefinite wait buffer: device: ad0s3d, blkno: 33488, size: 4096
>
>
>
> The disk is:
> # atacontrol cap 0 0
> ATA channel 0, Master, device ad0:
>
> Protocol ATA/ATAPI revision 6
> device model WDC WD1600JB-00EVA0
> serial number WD-WCAEK1298992
> firmware revision 15.05R15
> cylinders 16383
> heads 16
> sectors/track 63
> lba supported 268435455 sectors
> lba48 supported 312579695 sectors
> dma supported
> overlap not supported
>
> Feature Support Enable Value Vendor
> write cache yes no
> read ahead yes yes
> dma queued no no 0/0x00
> SMART yes yes
> microcode download yes yes
> security yes no
> power management yes yes
> advanced power management no no 0/0x00
> automatic acoustic management yes yes 254/0xFE 128/0x80
>
> # smartctl -a /dev/ad0
> smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Device Model: WDC WD1600JB-00EVA0
> Serial Number: WD-WCAEK1298992
> Firmware Version: 15.05R15
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 6
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Mon Oct 11 12:37:32 2004 EEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> The SMART RETURN STATUS return value (smartmontools -H option/Directive)
> can not be retrieved with this version of ATAng, please do not rely on this value
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x05) Offline data collection activity
> was aborted by an interrupting command from host.
> Auto Offline Data Collection: Disabled.
> Self-test execution status: ( 40) The self-test routine was interrupted
> by the host with a hard or soft reset.
> Total time to complete Offline
> data collection: (5061) seconds.
> Offline data collection
> capabilities: (0x79) SMART execute Offline immediate.
> No Auto Offline data collection support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> No General Purpose Logging support.
> Short self-test routine
> recommended polling time: ( 2) minutes.
> Extended self-test routine
> recommended polling time: ( 67) minutes.
> Conveyance self-test routine
> recommended polling time: ( 5) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
> 3 Spin_Up_Time 0x0007 155 147 021 Pre-fail Always - 2775
> 4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 464
> 5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 8
> 7 Seek_Error_Rate 0x000b 200 199 051 Pre-fail Always - 0
> 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3360
> 10 Spin_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
> 11 Calibration_Retry_Count 0x0013 100 100 051 Pre-fail Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 462
> 194 Temperature_Celsius 0x0022 124 253 000 Old_age Always - 26
> 196 Reallocated_Event_Count 0x0032 194 194 000 Old_age Always - 6
> 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
> 198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
> 199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 2
> 200 Multi_Zone_Error_Rate 0x0009 200 155 051 Pre-fail Offline - 0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
> # 1 Extended captive Interrupted (host reset) 80% 77 -
> # 2 Extended offline Aborted by host 90% 77 -
> # 3 Conveyance offline Completed without error 00% 76 -
> # 4 Short offline Completed without error 00% 76 -
> # 5 Conveyance offline Completed without error 00% 233 -
> # 6 Short captive Interrupted (host reset) 90% 233 -
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
>
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>
>
> Thanks,
--
Eduard Martinescu <martines at rochester.rr.com>
More information about the freebsd-questions
mailing list