Re: Sudden zpool checksums errors
- Reply: Dave Cottlehuber: "Re: Sudden zpool checksums errors"
 - Reply: David Christensen : "Re: Sudden zpool checksums errors"
 - Reply: mike tancsa : "Re: Sudden zpool checksums errors"
 - In reply to: mike tancsa : "Re: Sudden zpool checksums errors"
 - Go to: [ bottom of page ] [ top of archives ] [ this month ]
 
Date: Mon, 07 Apr 2025 15:15:02 UTC
On 4/7/25 15:07, mike tancsa wrote:
> What does the smartctl -a /dev/da# show for the temperatures of 
> the hard drives ? 
Temperatures vary between drives (probably due to their slot position in 
the chassis): over the last month, the coldest one averaged 30C with a 
max of 35C; the hottest averaged 39C, with a peak of 48C.
There does not seem to be a correlation between temperatures and errors 
(some drives gave errors are colder than others that didn't).
> Does smartctl -x show any interesting log entries for 
> the drives that threw errors vs the ones that did not ?
All "non-error" drives report:
SCT Error Recovery Control:
            Read: Disabled 
           Write: Disabled
All "error" drives report:
SCT Error Recovery Control:
            Read:    655 (65.5 seconds)
           Write:    670 (67.0 seconds)
I wonder if this could be the culprit...
I guess I should enable or disable it on all drives; however I've been 
reading mixed opinions on whether this is good or bad for ZFS.
Any suggestion?
"Errored" drives show a few "Resets Between Cmd Acceptance and 
Completion", "Number of Hardware Resets", "Number of ASR Events", 
"Transition from drive PhyRdy to drive PhyNRdy" and "Device-to-host 
register FISes sent due to a COMRESET".
Due to my ignorance I cannot tell what might be the cause and what the 
effect :(
  bye & Thanks
	av.