Constant minor ZFS corruption
Stephen McKay
mckay at freebsd.org
Thu Mar 10 23:03:35 UTC 2011
On Wednesday, 9th March 2011, Mike Tancsa wrote:
>On 3/9/2011 7:41 AM, Stephen McKay wrote:
>> Of the 12 disks, only 1 has been error-free. I've been doing this for
>> about 10 days now and there is no pattern that I can see in the errors.
>After adding a larger case for future expansion, we found the next day
>we were seeing all sorts of random errors
>
>Like
>
>Mar 3 05:34:47 offsite kernel: ad1: FAILURE - WRITE_DMA48
>status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=2281852580
>
>and
>
>Mar 4 08:56:15 offsite kernel: siisch1: siis_timeout is 00040000 ss
>04000000 rs 04000000 es 00000000 sts 801e2000 serr 00000000
Our system does not report any driver errors or disk errors. We see
checksum errors from ZFS (mostly in scrubs). It's like there's an
invisible pixie sprinkling bad data on our disks while we sleep.
>We narrowed it down to 2 problems. Failing / Marginal power supply and
>bad SATA cables. After changing the power supply, we still had a few
>disks errors.
If either of these were the cause of our problem, we'd see errors
logged, right? Not just invisible corruption?
We will probably swap the power supply and cables anyway soon, just to
see what happens, but on other machines where cables or power was the
problem I saw errors (just like yours) in the logs.
>After almost 5 days of uptime, no problems at all now. Not one error.
Well, we've got something to aim for, eh? :-)
Cheers,
Stephen.
More information about the freebsd-fs
mailing list