Analysis of disk file block with ZFS checksum error

Jeremy Chadwick koitsu at freebsd.org
Tue Mar 4 13:47:57 UTC 2008


On Tue, Mar 04, 2008 at 07:25:35AM -0600, Eric Anderson wrote:
> I'm starting to think there is a timing issue or some such problem with 
> ZFS, since I can use the same drives in a gmirror with UFS, and never have 
> any data problems (md5 checksums confirm it over-and-over).  I highly doubt 
> that everyone is seeing similar issues and it just is because ZFS is so 
> intense.  I've had plenty of systems under severe disk load that have never 
> exhibited corrupt files because of something like this.

One thing that hasn't been mentioned (or maybe it has been but I missed
it): FreeBSD's ZFS port is version 6, while Solaris is up to version 10.

Is it possible that the problem folks are experiencing, including the
infamous deadlock or crash on heavy I/O between UFS/UFS2 and ZFS
filesystems, could've been fixed between versions 6 and 10?

I myself use gstripe(8) and UFS2 (no softupdates) on two identical SATA
disks.  I do nightly backups so if I lose a disk, I'm OK.  My transfer
rates are quite good (~143MB/sec read, ~130MB/sec write -- really!) on
the stripe, and in the past 2 weeks I have spent a LOT of time copying
over 150GB of data back and forth between the stripe and the backup disk
without any issues.  All disks are on an ICH7 controller.

> I wish we could get our hands on this issue..  Seems like some common 
> threads are ATA/SATA disks.  Is your setup running 32bit or 64bit FreeBSD?  
> (if you already mentioned it, I'm sorry, I missed it)

So far the reports have shown that it's not specific to either i386 or
amd64, and that it's not specific to any type of hardware (motherboard,
controller, etc.).  Joe's setup is very different from mine, for
example.

If the same disks are fine when used with UFS/UFS2, then I'd say it's
less of a ATA subsystem bug, and more of an oddity with ZFS on FreeBSD.
If it's reproducable, that would be helpful to developers.

Regarding ATA/SATA though, there are reports of DMA timeouts and other
oddities happening on ATA/SATA disks on FreeBSD.  When I was using ZFS
not too long ago, I experienced that problem when doing heavy I/O
(copying data from a standard UFS2 disk to a ZFS RAIDZ pool).  It's been
the only time I've seen this problem.

http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/040013.html

The drive showed no signs of errors (SMART stats look fine, no
mechanical noises or other oddities).  I've since replaced it out of
pure paranoia with a disk identical to the ones on the gstripe(8).

Regarding those issues (DMA errors, etc.), Scott Long has offered to
help, but needs systems which can reproduce the problem reliably and
have remote access (serial highly recommended).

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |



More information about the freebsd-stable mailing list