From nobody Fri Apr 04 18:59:35 2025 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZTnwZ6YdTz5sFxQ for ; Fri, 04 Apr 2025 18:59:58 +0000 (UTC) (envelope-from dch@skunkwerks.at) Received: from fout-b7-smtp.messagingengine.com (fout-b7-smtp.messagingengine.com [202.12.124.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZTnwZ47MNz3kgx for ; Fri, 04 Apr 2025 18:59:58 +0000 (UTC) (envelope-from dch@skunkwerks.at) Authentication-Results: mx1.freebsd.org; none Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfout.stl.internal (Postfix) with ESMTP id 5971B114018F; Fri, 4 Apr 2025 14:59:57 -0400 (EDT) Received: from phl-imap-15 ([10.202.2.104]) by phl-compute-12.internal (MEProxy); Fri, 04 Apr 2025 14:59:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=skunkwerks.at; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm3; t=1743793197; x=1743879597; bh=pfK3m3eiJwfs7D/igHKPotxr0FizeLcY 4H9rFQwfHF0=; b=A8FnxvvKKcC6CIzvESi0YMZajPUjjP820yxHobpe1prcLG/V 4RJrzI9knRNilQ4PKVH52K88R23OQJb79cqAUN8DelOGsrIyuCwFRcB1tUMtCN9H 5lz5UI3I8VpsW0dubshIRX3CxoQCC615j+p+c984u/ccYA0gU8XqR98wEcw6p8h/ e56PpTEUKwzHUWCZ/8UBBKTbJd86r3ZdGaXGBW7MVu0/1SBeaKtW7vNuRDrP1WCM 8/6bCluCiZwxndzpixVuLMFfEW3YlGBQZyxc12Ensjz/NBtERS9rLNLR5OCXNfmJ N4RjMl+92iXbrpwHP1mg6hfEOk14cBTmeRyjmQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1743793197; x= 1743879597; bh=pfK3m3eiJwfs7D/igHKPotxr0FizeLcY4H9rFQwfHF0=; b=A qAGsSVXUMsHTVGrFDUxCtDMnXB6Tm8XzrRTdsKAtidwH7wHszQ8RtnXfgxhvB1SI ydwY0EQduGdav+r8/XcxLWg+6B/T2kdSPBTdz9K3hQ7BM4vJehJTPQTAmAzmm655 g0N26cQ2JWur6dPRIzoACVQtAYEuI+SRzZsgOlpVtb0HP+7HbyoKu3owjQ5qR+Xp 3rmbCVyu0Okk8vGDQ1FhKwG5p0sSYHd/ni50WYnLAAJ7MPx8wVAh6iIUusjLXNA6 XZpkWvqIxx86CLP0ocC54MIaDA39dMgnLpZInB4gSD+0QYwEGcWVdueZZhzffAao P2wzgi/mTjkhpxv5Vq69Q== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduledvvddvucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefogg ffhffvvefkjghfufgtgfesthejredtredttdenucfhrhhomhepfdffrghvvgcuvehothht lhgvhhhusggvrhdfuceouggthhesshhkuhhnkhifvghrkhhsrdgrtheqnecuggftrfgrth htvghrnhepieffhfdujeelieekueehgfeigeekleeljeeigefgudeuheetgfdtgeffieev uedvnecuffhomhgrihhnpehfrhgvvggsshgurdhorhhgnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepuggthhesshhkuhhnkhifvghrkhhsrdgr thdpnhgspghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepfh hrvggvsghsugdqqhhuvghsthhiohhnshesfhhrvggvsghsugdrohhrghdprhgtphhtthho pehmlhesnhgvthhfvghntggvrdhith X-ME-Proxy: Feedback-ID: ic0e84090:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id 5152078006B; Fri, 4 Apr 2025 14:59:56 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 X-ThreadId: T35d1ea0024b92151 Date: Fri, 04 Apr 2025 18:59:35 +0000 From: "Dave Cottlehuber" To: "Andrea Venturoli" Cc: freebsd-questions Message-Id: <3ddfecf7-2cb3-472c-bfce-93356e57b898@app.fastmail.com> In-Reply-To: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> References: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> Subject: Re: Sudden zpool checksums errors Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:151847, ipnet:202.12.124.0/24, country:AU] X-Rspamd-Queue-Id: 4ZTnwZ47MNz3kgx X-Spamd-Bar: ---- On Fri, 4 Apr 2025, at 15:42, Andrea Venturoli wrote: > Hello. > I'm finding it hard to believe that 7 disks out of 12 are failing or > just happened to misbehave all on the same day. > BTW, SMART says they are OK. Not saying its not zfs, but its probably not zfs.... fingers crossed! > I'm reluctant to blame RAM (since it's ECC) and power supply (as it's > redundant 2x800W). If its memory, and your mainboard supports it, you'll see failures in dmesg, MCA ... some good examples: https://lists.freebsd.org/pipermail/freebsd-hackers/2015-January/046878.html https://forums.freebsd.org/threads/mca-errors.88909/ https://forums.freebsd.org/threads/solved-weird-mca-errors.94800/ > Disks are 16TB TOSHIBA MG09ACA1 connected to a MegaRAID SAS-3 3108 (of > course not operating as RAID and with mrsas driver). Look for SCSI or CAM errors in your logs too, disconnects. I have seen storms of checksum errors in at least these situations: - faulty or failing storage / scsi controller - insufficient power (or failing power supplies) under load - overclocking - overheating on mainboard, or controller, or drives - actually really bad ECC memory - drive cables that have worked loose over time - over 50 disks failing within 2 days in a 200+ disk array - all disks failing within 20 days of deployment in 24 disk chassis Sometimes, vendors produce batches of Bad Disks - firmware bugs, physical defects, unexpected dust inside the sealed platters. Failures are far more correlated than you'd want to believe. External vibrations can cause problems. A slow process of upgrading firmware & checking each component, resetting all cables, is the best way to deal with this. A+ Dave