ICRC's

Jeremy Chadwick koitsu at FreeBSD.org
Mon Aug 11 13:05:55 UTC 2008


On Mon, Aug 11, 2008 at 07:58:22AM +0100, Thomas Hurst wrote:
> * Larry Rosenman (ler at lerctr.org) wrote:
> 
> > I'm getting the following on a zpool scrub:
> > 
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=54817587
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187521229
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=187522189
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=109095258
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=101327859
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=172911744
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=65393370
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=64741875
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=262496999
> > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293
> > 
> >  	NAME        STATE     READ WRITE CKSUM
> >  	    ad8     ONLINE       0     0    17
> 
> Having just experienced NTFS corruption in Windows thanks to a slightly
> kinked SATA cable (hint: *never* chkdsk/fsck/etc until you're sure the
> cables are fine), I would *love* to know why this causes a checksum
> error at ZFS level rather than a read error that any filesystem (or
> indeed RAID layer) will notice.

The ad8 errors you're quoting come from the ATA subsystem in FreeBSD.
That is lower-level (e.g. closer to the hardware) than ZFS's checksum
method is.

If Larry was using UFS, he'd also see the above errors from the kernel.
FreeBSD reports the CRC errors reported by the ATA device, ZFS reports
the said data as corrupted during scrubbing or standard usage (hence the
CKSUM field in 'zpool status'), and ZFS also *repairs* the corrupted
data.  I can't explain how the repair works, but it's one of the many
features of the filesystem.  I believe journalling filesystems (e.g.
ext3fs and gjournal) have this ability, while Standard UFS, UFS2, NTFS,
FAT, and many others do not.

> What's the point in having the connection protected by a CRC if it's
> just going to let bogus data through anyway?

A CRC (or checksum) acts as a method of differential detection, e.g.
detect corruption between X and Y.  CRCs are not the same thing as error
correction or retransmittal; they only result in reporting data
corruption, and cannot repair it.

http://en.wikipedia.org/wiki/Cyclic_redundancy_check

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list