Re: Sudden zpool checksums errors

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Sat, 05 Apr 2025 18:47:21 UTC
On 4/5/25 02:01, Andrea Venturoli wrote:
> On 4/4/25 20:59, Dave Cottlehuber wrote:
>> I have had marginal power supplies, backplane issues or break out 
>> cables from the controller manifest errors like that.  I would check 
>> the power supply first, backplane next, controller 3rd.
> 
> How would I go about this? How do I check these components?
> Does IPMI provide something useful?


Buy and use a hardware power supply tester.  ATX testers are inexpensive 
and readily available.  If your PSU's are not ATX, please post relevant 
server, PSU, etc., details if you cannot find a tester.


Run memory test and/or system test suite in motherboard firmware Setup 
utility.  Alternatively, download and burn Memtest86+ to a bootable USB 
stick and run it:

https://memtest.org/


Disconnect and reconnect the HBA from the motherboard, all power cables 
related to the HBA, backplanes, disks, etc., and all data cables related 
to the HBA, backplanes, disks, etc., clear the zpool errors, and test.


If none of the above fix the CKSUM errors, move the OS disc and data 
disks to a known good server and try again.


David