Reproducible ZFS checksum error, svn192136 (05/14/2009)

James R. Van Artsdalen james-freebsd-fs2 at jrv.org
Sun Jun 7 06:39:49 UTC 2009


I am able to reproduce a ZFS checksum error.  I believe I have ruled out
the hard disks, controllers, cables, etc - the usual suspects.  I have
not ruled out the computer itself as I don't have anything else similar
to test with.  No other errors are seen on that computer.

A Dell 435MT (Core i7) at 2.66 GHz with  12GB of RAM
2 Silicon Imagine 3132 PCI-e cards with 2 eSATA ports each
1 Addonics PCI-e card with 4 ports eSATA ports that identifies itself as
a Silicon Imagine 3124
Samsung 1TB disk

Error happens with any of the eSATA cards or the onboard Intel chipset
eSATA controller.
Error happens with any hard disk, enclosure or cabling
There are no I/O errors in the logs, and when I use an external hardware
RAID it reports no errors from the disks or reported to the host.

svn 192136 (Thu, 14 May 2009) amd64, GENERIC config

The disk is partitioned like this, with a UFS work area at the end and
the area up front being Mac OSX compatible.  It boots into UFS land, not ZFS

# gpart show
=>        34  1953525101  ad12  GPT  (932G)
          34           6        - free -  (3.0K)
          40      409600     1  efi  (200M)
      409640  1869229256     2  !6a898cc3-1dd2-11b2-99a6-080020736631 
(891G)
  1869638896         128     3  freebsd-boot  (64K)
  1869639024     4194304     4  freebsd-ufs  (2.0G)
  1873833328    33554432     5  freebsd-swap  (16G)
  1907387760     4194304     6  freebsd-ufs  (2.0G)
  1911582064    33554432     7  freebsd-ufs  (16G)
  1945136496     8388608     8  freebsd-ufs  (4.0G)
  1953525104          31        - free -  (16K)

For ease of moving the disk between SATA ports each UFS and swap is
labeled with gmirror:

# gmirror status
        Name    Status  Components
mirror/sroot  COMPLETE  ad12p4
mirror/sswap  COMPLETE  ad12p5
 mirror/stmp  COMPLETE  ad12p6
 mirror/susr  COMPLETE  ad12p7
 mirror/svar  COMPLETE  ad12p8

/boot/loader.conf contains

zfs_load="YES"
vm.kmem_size="1536M"
vm.kmem_size_min="1536M"
vfs.root.mountfrom="ufs:mirror/sroot"
kern.maxfiles="32K"
kern.ktrace.request_pool="512"
geom_mirror_load="YES"        # RAID1 disk driver (see gmirror(8))
vfs.zfs.debug=1
#vfs.zfs.prefetch_disable=1
loader_logo="beastie"        # Desired logo: fbsdbw, beastiebw, beastie,
none
boot_verbose="YES"        # -v: Causes extra debugging information to be
printed


1. Start one buildworld loop thusly on UFS.

cd /usr/src
while true
do
  make clean
  make buildworld
  touch "done-`date`"
done

2. Start writes to ZFS with rsync

Make a clean pool: zpool create pool ad12p2
Start an rsync copying data to ZFS.  I'm copying from a Mac-mini over
the network, which gets about 20 MB/s when the systems are not loaded.

3. Run "zpool scrub pool".  As each scrub completes start a new one.

At some point a scrub will report a checksum error(s), usually within
the first 500GB of the rsync, sometimes it takes a few TB.

I'm wondering if anyone else is able to try something similar, with I/O
to UFS and ZFS, and scrubs, to one disk, on a system with >> 4GB RAM.

PS. we need a debug sysctl to make zfs return data from a block with a
checksum error so we can easy see what data is on disk.


More information about the freebsd-fs mailing list