From nobody Mon Apr 07 19:32:39 2025 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZWfWH1Dxzz5s32r for ; Mon, 07 Apr 2025 19:32:59 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZWfWF5zXsz3rJr for ; Mon, 07 Apr 2025 19:32:57 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=sqjhTIWQ; dmarc=pass (policy=none) header.from=holgerdanske.com; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1744054368; bh=830Mvms+lK2iTQw63FtsTKfgs1nn9N0aAVe6eiE5yZc=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To: References:Content-Language:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=sqjhTIWQ7Zhx9Y7is8ls1iDOnoPTaq/Xjhsg6xV5orrpsOZrwzBe5QUObQF1IYDAX y022qy2gRaCc4mqXwYgYXdFb0ZA/r1KQgEfksUCGq34rKLL8HrWRX1ZT3mdzAS3KfW ljuqr5inHP1s2f7pBLrRyxMZCdDK0J6DssZEsOvfYBFBvfU6hQnc5yRj0ezbAUuBBl R+y61ClMB0be/9sOnYMG2MvsfBNRpR/IpK8MSkq5bVEj00JWjc2GcgRoazYhwcmhMm dlNvxCsdNVrV/Um1f2hK6KBsDVQxcxN28OQWOY4WPMKT2UqawQWcDTrg1lUQIBdFOl 1rHLd5FYIB7JitzZsY4FMd69jKHLvYQIGUhw/GBG/ggD+KgI+aozRsju3qE3WER4d9 LLQ8Cyka4ttsMILltvQW+Ea6qLN/K7oQdkP99Q4zdMJa94p2KD8PFDenjoF5S9JhGK Z7VeUHJklA5yF3VWh/wxeHQkbrQhQ0imEhOZlND/JBoCG37aDl1GNxtSLzuOvWdFBE PEacYPjzjFUHIdSVq0IwK2EJ3trwqfZMkvbs2aLboiTldifGyjvoeJ1YRQUx4ReW5m IVP7nIyh8SIThephFE+ryNt6Bteb5pkKBjy1ydCiTrMwJmH1sbQU2cL40FFb13vc5e q9k8cfK+C+KwoC1n259EMGao= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Mon, 7 Apr 2025 12:32:48 -0700 Message-ID: <10e6f77b-d0f3-41c3-ace0-38483fceac41@holgerdanske.com> Date: Mon, 7 Apr 2025 12:32:39 -0700 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Sudden zpool checksums errors To: questions@freebsd.org References: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> <032776db-a8a1-4134-a395-a59effbc4c30@netfence.it> <4c6b64ec-0e59-4f64-8faf-117c7686a87d@sentex.net> <0e703e40-1d87-4c4b-a2b1-f370933f713a@netfence.it> Content-Language: en-US From: David Christensen In-Reply-To: <0e703e40-1d87-4c4b-a2b1-f370933f713a@netfence.it> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-1.72 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_SPAM_LONG(1.00)[0.997]; NEURAL_HAM_SHORT(-0.92)[-0.918]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; R_SPF_ALLOW(-0.20)[+a:november.he.net]; ONCE_RECEIVED(0.20)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_COUNT_ONE(0.00)[1]; RCPT_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; RCVD_VIA_SMTP_AUTH(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MLMMJ_DEST(0.00)[questions@freebsd.org]; RCVD_TLS_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[holgerdanske.com:+] X-Rspamd-Queue-Id: 4ZWfWF5zXsz3rJr X-Spamd-Bar: - On 4/7/25 08:15, Andrea Venturoli wrote: > On 4/7/25 15:07, mike tancsa wrote: > >> What does the smartctl -a /dev/da# show for the temperatures of the >> hard drives ? > > Temperatures vary between drives (probably due to their slot position in > the chassis): over the last month, the coldest one averaged 30C with a > max of 35C; the hottest averaged 39C, with a peak of 48C. > There does not seem to be a correlation between temperatures and errors > (some drives gave errors are colder than others that didn't). I have been running Seagate 3 TB Barracuda and Constellation drives for several years. When the drive temperatures get above approximately 40 C, `zpool status` and/or `smartctl -x` start showing internal drive errors. Fixes include adding fans, increasing fan RPM, and removing/ rearranging drives to improve airflow. There is no substitute for good cooling. >> Does smartctl -x show any interesting log entries for the drives that >> threw errors vs the ones that did not ? > > All "non-error" drives report: > SCT Error Recovery Control: >            Read: Disabled >           Write: Disabled > > All "error" drives report: > SCT Error Recovery Control: >            Read:    655 (65.5 seconds) >           Write:    670 (67.0 seconds) > > I wonder if this could be the culprit... > I guess I should enable or disable it on all drives; however I've been > reading mixed opinions on whether this is good or bad for ZFS. > > Any suggestion? > > > > "Errored" drives show a few "Resets Between Cmd Acceptance and > Completion", "Number of Hardware Resets", "Number of ASR Events", > "Transition from drive PhyRdy to drive PhyNRdy" and "Device-to-host > register FISes sent due to a COMRESET". > > Due to my ignorance I cannot tell what might be the cause and what the > effect :( I have played with SCT settings in the past, but the smartctl statistics you cite make me think there are external connection problems. I think your best bet at this point is to re-seat all the disk drive related expansion cards, power cables, data cables, backplanes, drives, etc.. Even gold plated connections degrade over the years. Vacuum everything, especially heat sinks. While vacuuming, do not allow fans to spin -- they can become generators. If you are using non-locking, 1.5 Gbps, and/or 3 Gbps SATA cables, buy and install good quality locking 6Gbps SATA cables. Beware that some red insulation dye can corrode the copper conductors inside (I cut open a bad red SATA cable and found corroded powder instead of copper metal). I now buy black SATA cables. Bundle and dress all power cables to facilitate air flow. Bundle and dress signal cables separately. David