ZFS corrupting data, even just sitting idle

Brooks Talley brooks at illuminati.org
Tue Oct 2 12:07:41 PDT 2007


I do apologize for the subject-verb construction that implied that ZFS itself, or the ZFS code, or anyone responsible for ZFS, or the letter "Z", was corrupting the data rather than merely being subject to the corruption, or at most a potential suspect.  I should have said "A storage system comprised of ZFS filesystem, the underlying geom system, the kernel, the ATA driver, the firmware and hardware on the SATA card, the PCI bridge, the SATA cables, the drives themselves, the power supply, system case, and surrounding environment including temperature, humidity, and RF fields, is corrupting its data".  I just figured that was implicit and that we were all results-oriented rather than blame-oriented.  Sorry!

I will look into the Sorens ATA driver and see what I can dig up.

Thanks!
-b

----- Original Message -----
From: "Sverre Svenningsen" <ss.alert at online.no>
To: "Pawel Jakub Dawidek" <pjd at freebsd.org>
Cc: "Brooks Talley" <brooks at illuminati.org>, "freebsd-current" <freebsd-current at freebsd.org>
Sent: Tuesday, October 2, 2007 11:27:55 AM (GMT-0800) America/Los_Angeles
Subject: Re: ZFS corrupting data, even just sitting idle





On Oct 2, 2007 , at 20:14 , Pawel Jakub Dawidek wrote: 



On Tue, Oct 02, 2007 at 10:04:12AM -0700, Brooks Talley wrote: 


Hi, everyone. I'm running 7.0-current amd64, built from CVS on September 12 . I've got a 4.5TB ZFS array across 8 750GB drives in a RAIDZ1 + hotspare configuration. 


It's corrupting data even just sitting at idle with no access at all. I had loaded it up with about 4TB of data several weeks ago, then noticed that a zpool status showed checksum errors about a week ago. I ran a scrub and it turned 122 errors affecting about 20 files. The errors were spread across the physical disks pretty evenly, so it didn't seem like one bad drive. 


I left for vacation and unplugged the network from the machine to ensure that there would be no access to the disk. There are no cron jobs or anything else running locally that so much as touch the zpool. 


Upon returning, I ran a zpool scrub and it found an additional 116 checksum errors in another 17 files, also evenly spread across the physical drives. 


The system is running a Supermicro motherboard, Supermicro AOC-SAT-MV8 SATA card, and WD 750GB drives. 2GB memory, no real apps running, just storage. 


Anyone seen anything like this? It's a bit of a concern. 


Ok, and why do you blame ZFS for corrupting for data instead of be 
thankful for detecting corruptions? I'm quite sure it's not ZFS what is 
corrupting your data. 


-- 
Pawel Jakub Dawidek http://www.wheel.pl 
pjd at FreeBSD.org http://www.FreeBSD.org 
FreeBSD committer Am I Evil? Yes, I Am! 

Supposedly this card uses a Marvell 88SX6081 chipset, which as far as i could tell is handled by Sorens ATA driver. Looks like work done elsewhere in the kernel is making that driver misbehave in all sorts of weird ways now. 
It's nice that ZFS makes it easy to discover, at least :) 


-Sverre 


More information about the freebsd-current mailing list