Checksum errors across ZFS array

Dr Josef Karthauser joe at tao.org.uk
Fri Jul 20 21:39:59 UTC 2012


On 19 Jul 2012, at 18:15, James Snow wrote:

> On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote:
> 
>> Hi James,
>> 
>> It's almost definitely a memory problem. I'd change it ASAP if I were
>> you.
>> 
>> I lost about 70mb from my zfs pool for this very reason just a few
>> weeks ago. Luckily I had enough snapshots from before the rot set in
>> to recover most of what I lost.
> 
> Thanks for the input. I will run a memory test against it.
> 
> If I may, why "almost definitely" a memory problem and not an issue with
> the controller? (Or did you mean the controller memory?)

Hey Snow,

Ok, it's not definitely. Of course, it could be anything. But, memory is where I'd look first.

Take care though, my system which had been working fine for about a year when I noticed the ZFS rot (which all appears to be recent in time). I ran memcheck+ on it for 8 hours or so, and it showed no errors at all. However, when I replaced the memory with a different vendor the problems went away. (Reboots and power off/on restarts hadn't fixed the problem before!).

So, take care if the memory doesn't report any failures, it might still be faulty.

Joe

p.s. It was my fault that I wasn't running ECC memory on the system! :/.
 


More information about the freebsd-stable mailing list