tom.hurst at clara.net
Tue Sep 2 19:07:17 UTC 2008
* Jeremy Chadwick (koitsu at FreeBSD.org) wrote:
> On Mon, Aug 11, 2008 at 07:58:22AM +0100, Thomas Hurst wrote:
> > * Larry Rosenman (ler at lerctr.org) wrote:
> > > ad8: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=154593293
> > >
> > > NAME STATE READ WRITE CKSUM
> > > ad8 ONLINE 0 0 17
> > Having just experienced NTFS corruption in Windows thanks to a
> > slightly kinked SATA cable (hint: *never* chkdsk/fsck/etc until
> > you're sure the cables are fine), I would *love* to know why this
> > causes a checksum error at ZFS level rather than a read error that
> > any filesystem (or indeed RAID layer) will notice.
> The ad8 errors you're quoting come from the ATA subsystem in FreeBSD.
> That is lower-level (e.g. closer to the hardware) than ZFS's checksum
> method is.
Yes, but ZFS is clearly still seeing corrupt data from its reads because
the CKSUM counter's going up, not READ, which would indicate it's reads
were actually failing at ATA level.
> If Larry was using UFS, he'd also see the above errors from the
> kernel. FreeBSD reports the CRC errors reported by the ATA device,
> ZFS reports the said data as corrupted during scrubbing or standard
> usage (hence the CKSUM field in 'zpool status'),
ZFS should only see corruption that's undetected by ATA's CRC's though
(or the disk's own error correction); if it's actually causing a CRC
error at protocol level, ZFS should not see it, because that IO
> and ZFS also *repairs* the corrupted data. I can't explain how the
> repair works,
It repairs by having duplicate copies of data and metadata; in the case
of vital metadata it stores "ditto-blocks" so there are always multiple
copies of it about, similar to UFS's superblock being spread all over
the disk. For most data you generally want some level of ZFS RAID, but
I'm pretty sure you can make it store multiple copies on the same disk
(zfs set copies=2 on a 1-disk ZFS, for example).
In the event of IO errors, I believe some Linux software RAID levels can
perform similar recovery; rewriting the erroring blocks from another
device to force the disk to rewrite the broken block.
> I believe journalling filesystems (e.g. ext3fs and gjournal) have
> this ability, while Standard UFS, UFS2, NTFS, FAT, and many others do
No, journalling has nothing to do with this kind of self-healing; a
journal allows a filesystem to be made consistent when interrupted (i.e.
by a crash or power failure) with a very small number of operations
because it has a log of what has or was about to happen. Journalling
filesystems are just as vulnerable to corruption as non-journalling ones.
NTFS is journalling, BTW.
> > What's the point in having the connection protected by a CRC if it's
> > just going to let bogus data through anyway?
> A CRC (or checksum) acts as a method of differential detection, e.g.
> detect corruption between X and Y. CRCs are not the same thing as error
> correction or retransmittal; they only result in reporting data
> corruption, and cannot repair it.
Yes, I know what CRC's are; my point is, a CRC error should mean the
corrupt data doesn't make it to the higher layers; ZFS, UFS, gmirror,
whatever, should get an IO error if the data can't be retrieved after
retries fail, they shouldn't get an apparently successful read with
corrupt data in it.
Perhaps this is the case, and (S)ATA's CRC's are just so poor a couple of
retries is enough to get corrupt data which happens to have a correct
Thomas 'Freaky' Hurst
More information about the freebsd-stable