A little story of failed raid5 (3ware 8000 series)

Tom Samplonius tom at samplonius.org
Sat Aug 25 22:12:58 PDT 2007


----- "David Schwartz" <davids at webmaster.com> wrote:

> > It is supposed to be 
> > for detecting data corruption, so if the card isn't using the 
> > checksum, its kinda of useless.
> 
> You are confused. Checking for data corruption is done, by checking if
> the *DATA* is corrupt. This does not require looking at the RAID5
> checksum since the data has its own data checksum.

  No, not really.  You are just referring to parity as checksums.  They are different.

  Many RAID systems have checksums in addition to parity.  For example, Netapp ZCS disks.

...

> > However, in this particular case, validating checksums would 
> > have been unhelpful, since the disk was unreadable.  diskcheckd 
> > would have detected this issue.  It would probably have prevented 
> > the problem, if it had been running previously.
> 
> No, it would have saved him. The problem was he lost a drive, and
> checksums *ON* *OTHER* *DRIVES* were unreadable. Quite possibly they
> had been unreabable for months, but were never checked, since they are
> only *needed* to reconstruct the data.

  Which is what I said?  The data on the other disks is unreadable.  It doesn't really matter what parity or data was on those sectors.  Yes, diskcheckd would only read data sectors.

> > ZFS is also a good option.  It has file level checksumming.  
> > ZFS never trusts the disks, and is super paranoid.  And ZFS can 
> > do background scrubbing too.  I can't wait for ZFS in FreeBSD 7, 
> > because ZFS in software is going to 10 x better than anything 3ware
> has.
> 
> That wouuld not have helped him one bit. When the drive failed, the
> RAID 5 checksums on the other drives still would not have been
> scrubbed. The RAID 5 checksum (technically an XOR) is only needed to
> recover the RAID 5 array if a drive (or sector) fails.

  Ok, you should probably not refer to RAID5 parity as "checksums".  They are different.  Some RAID systems have both.  And some do not.

  ZFS checksums the file level data, which is independent from any RAID5 parity.  And yes a media scan would have helped.


Tom


More information about the freebsd-stable mailing list