Constant minor ZFS corruption

Wed Mar 9 12:57:11 UTC 2011

On Tuesday, 8th March 2011, Chris Forgeron wrote:

>Have you make sure it's not always the same drives with the checksum
>errors? It make take a few days to know for sure..

Of the 12 disks, only 1 has been error-free.  I've been doing this for
about 10 days now and there is no pattern that I can see in the errors.

>It shouldn't matter which NFS client you use, if you're seeing ZFS
>checksum errors in zpool status, that won't be from whatever program
>is writing it.

I'm mounting another server using NFS to get my 1TB of test data, so
the problem box is the NFS client, not the server.  Sorry if there was
any confusion.

I had a theory that there was a race in the NFS client or perhaps the
code that steal memory from ZFS's ARC when NFS needs it.  However, that
seems less likely now as I have done the same 1TB test copy again today
but this time using ssh as the transport.  I saw the same ZFS checksum
errors as before. :-(

>..oh, and don't forget about fun with Expanders. I assume you're using one?

No.  This board has 14 ports made up of 6 native and 8 from the LSI2008
chip on the PIKE card.  Each is cabled directly to a drive.

>I've got 2 LSI2008 based controllers in my 9-Current machine without
>any fuss. That's running a 24 disk Mirror right now.

That's encouraging news.  Maybe I can win eventually.

On Tuesday, 8th March 2011, Mark Felder wrote:
>Highly interested in what FreeBSD version and what ZFS version and zpool  
>version you're running.

I was using 8.2-release plus the mps driver from 8.2-stable.  Hence
the filesystem version is 4 and pool version is 15.  But I installed
-current a few days ago while keeping the same pool and found that the
errors still occurred.  The v28 code has extra locking in interesting
places but it made no difference to the checksum errors.

As of today, I've destroyed the pool and built a version 28 pool (fs
version 5) on a subset of disks (those attached to the onboard
controller).  I'll know by tomorrow how that went.

BTW, with my code in place to decipher "type 19" entries and a kernel
printf that bypasses the need to get devd.conf right, I see something
like this for each checksum error:

log_sysevent: ESC_dev_dle:  class=ereport.fs.zfs.checksum ena=4822220020083854337 detector={ version=0 scheme=zfs pool=44947180927799912 vdev=6194846651369573567} pool=dread pool_guid=44947180927799912 pool_context=0 pool_failmode=wait vdev_guid=6194846651369573567 vdev_type=disk vdev_path=/dev/gpt/bay7 parent_guid=18008078209829074821 parent_type=raidz zio_err=0 zio_offset=194516353024 zio_size=4096 zio_objset=276 zio_object=0 zio_level=0 zio_blkid=132419 bad_ranges=0000000000001000 bad_ranges_min_gap=8 bad_range_sets=00000445 bad_range_clears=00002924 bad_set_histogram=001b001a001e002b0021001600120018001a001600210018001500150016001c001c0019001200190022001b0019001b0017000f0014000e0013001a001c001f000c000c000c0007000b000d0010001f00060009000800080007000c0010000f00070007000500070008000600080008000a0002000100060004000300070004 bad_cleared_histogram=00820089009700ac00a700b900b2009000730084009500af00a300ad00a900ac0082009300ad00c200ac00d200a8008f0078008b008e00b700bf00b9009f00a60083!
 009500a400c100c200b700cd009900780090009b00be00af00c100a700980083008a00a200c900bc00d400b200a3007e0089009400c400c700d400b8009b

That's a hideous blob of awful, and I don't really know what to do with it.

Cheers,

Stephen.