ZFS i/o errors - which disk is the problem?

Eric Anderson anderson at freebsd.org
Wed Jan 2 04:32:07 PST 2008


Bernd Walter wrote:
> On Tue, Jan 01, 2008 at 10:44:43PM -0600, Eric Anderson wrote:
>> I created a zpool with two new identical (500GB) SATA disks.  I rsync'ed 
>> a bunch of data over to the new ZFS file systems, and started seeing i/o 
>> errors.
>>
>> Here's how I created the file systems:
>>
>> zpool create tank mirror ad6 ad8
>> zfs create tank/media
>> zfs create tank/documents
>> zfs set sharenfs=on tank/media
>> zfs set sharenfs=on tank/documents
>> zfs set atime=off tank
>> zfs set mountpoint=/media tank/media
>> zfs set mountpoint=/documents tank/documents
>>
>>
>> Here's what zpool status says:
>>
>> # zpool status
>>   pool: tank
>>  state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://www.sun.com/msg/ZFS-8000-8A
>>  scrub: scrub completed with 731 errors on Tue Jan  1 15:17:08 2008
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         tank        ONLINE       0     0 1.47K
>>           mirror    ONLINE       0     0 1.47K
>>             ad6     ONLINE       0     0 5.12K
>>             ad8     ONLINE       0     0 4.66K
>>
>> How can I tell which drive gave the problems, or where the problem came 
>> from?   I see several errors in /var/log/messages, like:
>>
>> ZFS: zpool I/O failure, zpool=tank error=86
> 
> zpool status -v should tell you more details.
> But it is not required, since the message below is enough.

Yes,  I did that, but of course >700 files were listed, but that's about 
the only difference in output, so I omitted it here.


>> and many many of these:
>>
>> ZFS: checksum mismatch, zpool=tank path=/dev/ad6 offset=31970426880 
>> size=131072
>>
>> for both the ad6 and ad8 devices.
> 
> So you have crc errors on both drives.
> 
>> I'm happy to swap the drive out, but I don't know which is the problem. 
>>   I was also wondering if it was a saturated I/O issue on the system 
>> (it's a fairly slow and poky old box).
> 
> The errors mean that silently data written to disk were not the same
> when they were read back.
> I doubt that this are the drives, but if they are identic it is possible
> of course, since firmware bugs are not impossible.
> More likely you have a problematic ata controller or maybe defective
> ram.


I can believe a problematic SATA controller (it's an add-on PCI board), 
but does anyone know of a way to ask ZFS which devices in a pool it 
thinks has issues?


Eric




More information about the freebsd-fs mailing list