Checksum errors across ZFS array

James Snow snow at teardrop.org
Fri Jul 20 22:22:55 UTC 2012


On Fri, Jul 20, 2012 at 04:09:28PM +0100, Dr Josef Karthauser wrote:

> Take care though, my system which had been working fine for about
> a year when I noticed the ZFS rot (which all appears to be recent
> in time). I ran memcheck+ on it for 8 hours or so, and it showed no
> errors at all. However, when I replaced the memory with a different
> vendor the problems went away. (Reboots and power off/on restarts
> hadn't fixed the problem before!).
>
> So, take care if the memory doesn't report any failures, it might
> still be faulty.

I've run memtest for about 20 hours now (13 hours in one pass, 7 and
counting on the second) and seen no errors. Hrm.

> p.s. It was my fault that I wasn't running ECC memory on the system!

I am running ECC memory though. If you'd had ECC memory to start do you
think you might have seen a different result?

In my case, replacing all the RAM and getting a 2nd controller are
almost the same cost. Since a second controller will give me the best
visibility - or long-term expandability if it turns out not to be the
controller - I've gone ahead and ordered one.

If I move half the disks to the new controller and continue to see the
problems only on the old controller, I know it's the controller or the
slot on the motherboard. If the problem continues without any change, I
can replace RAM, and then the motherboard.


-Snow



More information about the freebsd-stable mailing list