ZFS raidz2, errors in file?
James R. Van Artsdalen
james at jrv.org
Thu Oct 18 05:20:21 UTC 2012
On 10/17/2012 12:39 PM, Heikki Suonsivu wrote:
> SMART data indicates problems on two other disks, but no indication of
> those are seen in logs (the disks work, but SMART information
> indicates problems).
The problems may be in areas ZFS has not tried to read.
> One disk indeed has pending sector, not unusual and should be survivable:
>
> ------------------------------------------------------------------------
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age
> Always - 1
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 1
That error means one sector is unreadable and a replacement is pending;
replacement will happen when next as the sector is overwritten. The
contents of that sector are lost (unless some future read succeeds).
> In addition, there seems to be ICRC DMA errors on da0. Looks nasty,
> but only show up in SMART log, not in /var/log/messages.
>
> ------------------------------------------------------------------------
> 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
> Always - 112
I believe that both of these messages refer to errors in transfers
between the disk and host, not to errors within the disk. Test your
cabling and enclosures.
> SMART Error Log Version: 1
> ATA Error Count: 112 (device log contains only the most recent five
> errors)
I don't like these at all. Consider replacing that disk.
> If the da0 ICRC errors would have been seen by ZFS, it should have
> made a) note of that in some log? b) retried write? c) Something
> else? If we assume that the disk firmware is broken and does not
> report these to OS, so da0 might be corrupt. But that should still be
> ok with raidz2.
These errors should trigger retries in layers beneath ZFS
> We do have 3 random SCSI timeouts, which were seen by FreeBSD, and
> thus should have prompted ZFS do handle the errors, and one read error
> on data, which is not reported as read error in any log, other than
> disk's SMART info says so.
The retries may have happened at layer below ZFS.
ZFS does not call the disk driver directly. Other layers play a role in
error handing.
More information about the freebsd-fs
mailing list