Re: Sudden zpool checksums errors
- Reply: Dave Cottlehuber: "Re: Sudden zpool checksums errors"
- Reply: David Christensen : "Re: Sudden zpool checksums errors"
- Reply: mike tancsa : "Re: Sudden zpool checksums errors"
- In reply to: mike tancsa : "Re: Sudden zpool checksums errors"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 07 Apr 2025 15:15:02 UTC
On 4/7/25 15:07, mike tancsa wrote:
> What does the smartctl -a /dev/da# show for the temperatures of
> the hard drives ?
Temperatures vary between drives (probably due to their slot position in
the chassis): over the last month, the coldest one averaged 30C with a
max of 35C; the hottest averaged 39C, with a peak of 48C.
There does not seem to be a correlation between temperatures and errors
(some drives gave errors are colder than others that didn't).
> Does smartctl -x show any interesting log entries for
> the drives that threw errors vs the ones that did not ?
All "non-error" drives report:
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
All "error" drives report:
SCT Error Recovery Control:
Read: 655 (65.5 seconds)
Write: 670 (67.0 seconds)
I wonder if this could be the culprit...
I guess I should enable or disable it on all drives; however I've been
reading mixed opinions on whether this is good or bad for ZFS.
Any suggestion?
"Errored" drives show a few "Resets Between Cmd Acceptance and
Completion", "Number of Hardware Resets", "Number of ASR Events",
"Transition from drive PhyRdy to drive PhyNRdy" and "Device-to-host
register FISes sent due to a COMRESET".
Due to my ignorance I cannot tell what might be the cause and what the
effect :(
bye & Thanks
av.