ZFS i/o errors - which disk is the problem?
Eric Anderson
anderson at freebsd.org
Wed Jan 2 04:32:07 PST 2008
Bernd Walter wrote:
> On Tue, Jan 01, 2008 at 10:44:43PM -0600, Eric Anderson wrote:
>> I created a zpool with two new identical (500GB) SATA disks. I rsync'ed
>> a bunch of data over to the new ZFS file systems, and started seeing i/o
>> errors.
>>
>> Here's how I created the file systems:
>>
>> zpool create tank mirror ad6 ad8
>> zfs create tank/media
>> zfs create tank/documents
>> zfs set sharenfs=on tank/media
>> zfs set sharenfs=on tank/documents
>> zfs set atime=off tank
>> zfs set mountpoint=/media tank/media
>> zfs set mountpoint=/documents tank/documents
>>
>>
>> Here's what zpool status says:
>>
>> # zpool status
>> pool: tank
>> state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>> corruption. Applications may be affected.
>> action: Restore the file in question if possible. Otherwise restore the
>> entire pool from backup.
>> see: http://www.sun.com/msg/ZFS-8000-8A
>> scrub: scrub completed with 731 errors on Tue Jan 1 15:17:08 2008
>> config:
>>
>> NAME STATE READ WRITE CKSUM
>> tank ONLINE 0 0 1.47K
>> mirror ONLINE 0 0 1.47K
>> ad6 ONLINE 0 0 5.12K
>> ad8 ONLINE 0 0 4.66K
>>
>> How can I tell which drive gave the problems, or where the problem came
>> from? I see several errors in /var/log/messages, like:
>>
>> ZFS: zpool I/O failure, zpool=tank error=86
>
> zpool status -v should tell you more details.
> But it is not required, since the message below is enough.
Yes, I did that, but of course >700 files were listed, but that's about
the only difference in output, so I omitted it here.
>> and many many of these:
>>
>> ZFS: checksum mismatch, zpool=tank path=/dev/ad6 offset=31970426880
>> size=131072
>>
>> for both the ad6 and ad8 devices.
>
> So you have crc errors on both drives.
>
>> I'm happy to swap the drive out, but I don't know which is the problem.
>> I was also wondering if it was a saturated I/O issue on the system
>> (it's a fairly slow and poky old box).
>
> The errors mean that silently data written to disk were not the same
> when they were read back.
> I doubt that this are the drives, but if they are identic it is possible
> of course, since firmware bugs are not impossible.
> More likely you have a problematic ata controller or maybe defective
> ram.
I can believe a problematic SATA controller (it's an add-on PCI board),
but does anyone know of a way to ask ZFS which devices in a pool it
thinks has issues?
Eric
More information about the freebsd-fs
mailing list