ZFS i/o errors - which disk is the problem?

Brooks Davis brooks at freebsd.org
Tue Jan 8 10:59:10 PST 2008


On Tue, Jan 08, 2008 at 08:15:18AM -0700, Scott Long wrote:
> Bernd Walter wrote:
>> On Mon, Jan 07, 2008 at 10:36:00PM -0700, Scott Long wrote:
>>> Bernd Walter wrote:
>>>> On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote:
>>>>> 2008/1/4, Brooks Davis <brooks at freebsd.org>:
>>>> The data is corrupted by controller and/or disk subsystem.
>>>> You have no other data sources for the broken data, so it is lost.
>>>> The only garantied way is to get it back from backup.
>>>> Maybe older snapshots/clones are still readable - I don't know.
>>>> Nevertheless data is corrupted and that's the purpose for alternative
>>>> data sources such as raidz/mirror and at last backup.
>>>> You shouldn't have ignored those errors at first, because you are
>>>> running with faulty hardware.
>>>> Without ZFS checksumming the system would just process the broken
>>>> data with unpredictable results.
>>>> If all those errors are fresh then you likely used a broken RAID
>>>> controller below ZFS, which silently corrupted syncronity and then
>>>> blow when disk state changed.
>>>> Unfortunately many RAID controllers are broken and therefor useless.
>>>> 
>>> Huh?  Could you be any more vague?  Which controllers are broken?  Have 
>>> you contacted anyone about the breakage?  Can you describe the breakage?
>>> I call bullshit, pure and simple.
>> Just go back a few mails in the same thread were someone fixed CRC
>> errors by updating the RAID controller firmware.
>> I'm amazed how often I read something like this lately.
>> And if you read the whole thread then you will notice that we are
>> currently talking about another person which has corrupted data on
>> a RAID disk - not sure if this is the controller, a drive or the
>> drivers, but something is faulty here and I wouldn't be surprised
>> if it is the controller.
>> And then there are so many RAID controllers without backed memory or
>> other mechanism to garantie syncronity for the disks, which I call
>> broken by design.
>> You know yourself how important syncronity is for RAID, especially
>> when it comes to parity based RAID and you know how fragile it is
>> when it comes to power failure.
> 
> Your argument is complete hearsay and poorly formed opinion.  That's
> fine, just be honest about it and don't mislead others into thinking
> that you know what you're talking about when it comes to RAID.

We saw ZFS CRC errors on one system running Solaris x86 with a 16-port
Areca controller (I don't have the model number handy) until we did a
firmware upgrade after contacting Areca.  The controller was running in
JBOD mode.

-- Brooks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20080108/caed1948/attachment.pgp


More information about the freebsd-fs mailing list