constant zfs data corruption

JoaoBR joao at matik.com.br
Mon Oct 20 10:14:01 PDT 2008


On Monday 20 October 2008 14:44:50 Chuck Swiger wrote:
> Hi, all--
>
> On Oct 20, 2008, at 6:22 AM, Jeremy Chadwick wrote:
> [ ...JoaoBR wrote... ]
>
> >> well, hardware seems to be ok and not older than 6 month, also
> >> happens not
> >> only on one machine ... smartctl do not report any hw failures on
> >> disk
> >>
> >> regarding jumpering the drives to 150 you suspect a driver problem?
> >
> > It's not because of a driver problem.  There are known SATA chipsets
> > which do not properly work with SATA300 (particularly VIA and SiS
> > chipsets); they claim to support it, but data is occasionally
> > corrupted.
> > Capping the drive to SATA150 fixes this problem.
> >
> > http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit
> >.2Fs
>
> Exactly so.  Just as a general principle, if you've got sporadic data
> corruption, turning I/O and system busses down a notch and retesting
> is a useful starting point towards identifying whether the issue is
> repeatable and whether it leans towards a hardware issue or software.
> However, ZFS file checksumming supposedly is code that has been
> carefully reviewed and tested so when it logs problems that is
> supposed to be a fairly sure sign that the hardware isn't behaving
> right.
>

ok, I will jumper it on some machines and see if the error comes back, even if 
my are Nvidia Sata

>
> > Because you didn't provide your smartctl output, I can't really tell
> > if
> > the drives are in "good shape" or not.  :-)
> >
> > Also, do you not think it's a little odd that the only data corruption
> > occurring for you are related to RRDtool?
>
> RRD tends to involve lots of small writes so it's files are going to
> be changed often compared to other things that might be running; a
> busy webserver or mailserver would involve more I/O to logfiles and
> queue/mailspool, or so I would expect, but who knows what the machine
> in question is being used for?
>

this server are transparent proxies (squid) on the top of small ISP networks 
with IPFW bandwidth control for the clients, the rrdtools collect the client 
traffic and some other data at every 5 minutes

very ocasional I get the data corruption on a squid_cache file, normally 2 
days after the rrdtool error appears first

-- 


João







A mensagem foi scaneada pelo sistema de e-mail e pode ser considerada segura.
Service fornecido pelo Datacenter Matik  https://datacenter.matik.com.br


More information about the freebsd-stable mailing list