Re: Sudden zpool checksums errors
- Reply: Andrea Venturoli : "Re: Sudden zpool checksums errors"
- In reply to: Andrea Venturoli : "Re: Sudden zpool checksums errors"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 10 Apr 2025 13:25:42 UTC
On 4/7/2025 11:15 AM, Andrea Venturoli wrote: > On 4/7/25 15:07, mike tancsa wrote: > >> What does the smartctl -a /dev/da# show for the temperatures of the >> hard drives ? > > Temperatures vary between drives (probably due to their slot position > in the chassis): over the last month, the coldest one averaged 30C > with a max of 35C; the hottest averaged 39C, with a peak of 48C. > There does not seem to be a correlation between temperatures and > errors (some drives gave errors are colder than others that didn't). > > > >> Does smartctl -x show any interesting log entries for the drives that >> threw errors vs the ones that did not ? > > All "non-error" drives report: > SCT Error Recovery Control: > Read: Disabled > Write: Disabled > > All "error" drives report: > SCT Error Recovery Control: > Read: 655 (65.5 seconds) > Write: 670 (67.0 seconds) > > I wonder if this could be the culprit... > I guess I should enable or disable it on all drives; however I've been > reading mixed opinions on whether this is good or bad for ZFS. > > Any suggestion? Not familiar with those settings so I wont hallucinate like an LLM and offer advice here. > > > > "Errored" drives show a few "Resets Between Cmd Acceptance and > Completion", "Number of Hardware Resets", "Number of ASR Events", > "Transition from drive PhyRdy to drive PhyNRdy" and "Device-to-host > register FISes sent due to a COMRESET". > > Due to my ignorance I cannot tell what might be the cause and what the > effect :( Hardware resets certainly can correlate with a bad or under performing power supply. What does "ipmitool sensor" show for your power supplies ? If you do ipmitool lan print, and if it has an IP, try and login to the web interface of the BMC and see if your board offers power utilization. Dont post the IP info here :) ---Mike > > > > > bye & Thanks > av. >