Reproducible ZFS checksum error, svn192136 (05/14/2009)
James R. Van Artsdalen
james-freebsd-fs2 at jrv.org
Sun Jun 7 06:39:49 UTC 2009
I am able to reproduce a ZFS checksum error. I believe I have ruled out
the hard disks, controllers, cables, etc - the usual suspects. I have
not ruled out the computer itself as I don't have anything else similar
to test with. No other errors are seen on that computer.
A Dell 435MT (Core i7) at 2.66 GHz with 12GB of RAM
2 Silicon Imagine 3132 PCI-e cards with 2 eSATA ports each
1 Addonics PCI-e card with 4 ports eSATA ports that identifies itself as
a Silicon Imagine 3124
Samsung 1TB disk
Error happens with any of the eSATA cards or the onboard Intel chipset
eSATA controller.
Error happens with any hard disk, enclosure or cabling
There are no I/O errors in the logs, and when I use an external hardware
RAID it reports no errors from the disks or reported to the host.
svn 192136 (Thu, 14 May 2009) amd64, GENERIC config
The disk is partitioned like this, with a UFS work area at the end and
the area up front being Mac OSX compatible. It boots into UFS land, not ZFS
# gpart show
=> 34 1953525101 ad12 GPT (932G)
34 6 - free - (3.0K)
40 409600 1 efi (200M)
409640 1869229256 2 !6a898cc3-1dd2-11b2-99a6-080020736631
(891G)
1869638896 128 3 freebsd-boot (64K)
1869639024 4194304 4 freebsd-ufs (2.0G)
1873833328 33554432 5 freebsd-swap (16G)
1907387760 4194304 6 freebsd-ufs (2.0G)
1911582064 33554432 7 freebsd-ufs (16G)
1945136496 8388608 8 freebsd-ufs (4.0G)
1953525104 31 - free - (16K)
For ease of moving the disk between SATA ports each UFS and swap is
labeled with gmirror:
# gmirror status
Name Status Components
mirror/sroot COMPLETE ad12p4
mirror/sswap COMPLETE ad12p5
mirror/stmp COMPLETE ad12p6
mirror/susr COMPLETE ad12p7
mirror/svar COMPLETE ad12p8
/boot/loader.conf contains
zfs_load="YES"
vm.kmem_size="1536M"
vm.kmem_size_min="1536M"
vfs.root.mountfrom="ufs:mirror/sroot"
kern.maxfiles="32K"
kern.ktrace.request_pool="512"
geom_mirror_load="YES" # RAID1 disk driver (see gmirror(8))
vfs.zfs.debug=1
#vfs.zfs.prefetch_disable=1
loader_logo="beastie" # Desired logo: fbsdbw, beastiebw, beastie,
none
boot_verbose="YES" # -v: Causes extra debugging information to be
printed
1. Start one buildworld loop thusly on UFS.
cd /usr/src
while true
do
make clean
make buildworld
touch "done-`date`"
done
2. Start writes to ZFS with rsync
Make a clean pool: zpool create pool ad12p2
Start an rsync copying data to ZFS. I'm copying from a Mac-mini over
the network, which gets about 20 MB/s when the systems are not loaded.
3. Run "zpool scrub pool". As each scrub completes start a new one.
At some point a scrub will report a checksum error(s), usually within
the first 500GB of the rsync, sometimes it takes a few TB.
I'm wondering if anyone else is able to try something similar, with I/O
to UFS and ZFS, and scrubs, to one disk, on a system with >> 4GB RAM.
PS. we need a debug sysctl to make zfs return data from a block with a
checksum error so we can easy see what data is on disk.
More information about the freebsd-fs
mailing list