constant zfs data corruption
JoaoBR
joao at matik.com.br
Mon Oct 20 10:09:18 PDT 2008
On Monday 20 October 2008 11:22:08 you wrote:
> On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote:
> > On Friday 17 October 2008 15:39:59 Chuck Swiger wrote:
> > > On Oct 17, 2008, at 11:30 AM, JoaoBR wrote:
> > > > constantly I find data corruption on ZFS volums, ever from rrdtool,
> > > > this
> > > > corrupt data happens on SATA disks, never seem on SCSI
> > >
> > > Presumably your SATA drives are correctly being reported by ZFS as
> > > corrupting data, and you should do something like replace cables, the
> > > drives themselves, perhaps try downgrading to SATA-150 rather than
> > > -300 if you are using the later. Also consider running a drive
> > > diagnostic utility from the mfgr (or smartmontools) and doing an
> > > extended self-test or destructive write surface check.
> >
> > well, hardware seems to be ok and not older than 6 month, also happens
> > not only on one machine ... smartctl do not report any hw failures on
> > disk
> >
> > regarding jumpering the drives to 150 you suspect a driver problem?
>
> It's not because of a driver problem. There are known SATA chipsets
> which do not properly work with SATA300 (particularly VIA and SiS
> chipsets); they claim to support it, but data is occasionally corrupted.
> Capping the drive to SATA150 fixes this problem.
>
> http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit.2
>Fs
>
> There are also known problems with Silicon Image chipsets (on Linux,
> Windows, and FreeBSD).
>
> Because you didn't provide your smartctl output, I can't really tell if
> the drives are in "good shape" or not. :-)
>
ok then here it comes
smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar T7K500
Device Model: Hitachi HDT725025VLA380
Serial Number: VFL101RK0A9SDP
Firmware Version: V5DOA7EA
User Capacity: 250.058.268.160 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1
Local Time is: Mon Oct 20 15:07:01 2008 BRST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (4949) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 83) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 099 099 016 Pre-fail
Always - 3
2 Throughput_Performance 0x0005 100 100 050 Pre-fail
Offline - 0
3 Spin_Up_Time 0x0007 117 117 024 Pre-fail
Always - 316 (Average 322)
4 Start_Stop_Count 0x0012 100 100 000 Old_age
Always - 36
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail
Offline - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age
Always - 800
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 36
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 69
193 Load_Cycle_Count 0x0012 100 100 000 Old_age
Always - 69
194 Temperature_Celsius 0x0002 130 130 000 Old_age
Always - 46 (Lifetime Min/Max 19/52)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age
Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
> Also, do you not think it's a little odd that the only data corruption
> occurring for you are related to RRDtool?
this yes I think is suspitious
--
João
A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik https://datacenter.matik.com.br
More information about the freebsd-stable
mailing list