From nobody Fri Apr 04 15:42:26 2025 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4ZTjXr3Mfwz5ryWg for ; Fri, 04 Apr 2025 15:42:36 +0000 (UTC) (envelope-from ml@netfence.it) Received: from soth.netfence.it (mailserver.netfence.it [78.134.96.152]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mailserver.netfence.it", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4ZTjXq2dxfz4Jc7 for ; Fri, 04 Apr 2025 15:42:35 +0000 (UTC) (envelope-from ml@netfence.it) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=netfence.it; spf=pass (mx1.freebsd.org: domain of ml@netfence.it designates 78.134.96.152 as permitted sender) smtp.mailfrom=ml@netfence.it Received: from [10.1.2.18] (alamar.local.netfence.it [10.1.2.18]) (authenticated bits=0) by soth.netfence.it (8.18.1/8.17.2) with ESMTPSA id 534FgQmL042608 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO) for ; Fri, 4 Apr 2025 17:42:27 +0200 (CEST) (envelope-from ml@netfence.it) X-Authentication-Warning: soth.netfence.it: Host alamar.local.netfence.it [10.1.2.18] claimed to be [10.1.2.18] Message-ID: <6aeb488d-b3c3-4393-80ca-0b89c1ebc446@netfence.it> Date: Fri, 4 Apr 2025 17:42:26 +0200 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: freebsd-questions@freebsd.org From: Andrea Venturoli Subject: Sudden zpool checksums errors Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-1.63 / 15.00]; NEURAL_HAM_LONG(-0.98)[-0.977]; NEURAL_SPAM_MEDIUM(0.86)[0.864]; NEURAL_HAM_SHORT(-0.72)[-0.720]; DMARC_POLICY_ALLOW(-0.50)[netfence.it,none]; R_SPF_ALLOW(-0.20)[+ip4:78.134.96.152]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:35612, ipnet:78.134.0.0/17, country:IT]; RCVD_COUNT_ONE(0.00)[1]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROM(0.00)[]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-questions@freebsd.org]; FROM_HAS_DN(0.00)[]; HAS_XAW(0.00)[]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4ZTjXq2dxfz4Jc7 X-Spamd-Bar: - Hello. I've got a box with two zpools: _ 1 mirror on 2 SSDs; _ 1 raidz1 on 12 HDDs. Suddenly one daily run showed the following: > pool: backup > state: ONLINE > status: One or more devices has experienced an unrecoverable error. An > attempt was made to correct the error. Applications are unaffected. > action: Determine if the device needs to be replaced, and clear the errors > using 'zpool clear' or replace the device with 'zpool replace'. > see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P > scan: scrub repaired 3.18M in 16:53:16 with 0 errors on Tue Apr 1 20:16:55 2025 > config: > > NAME STATE READ WRITE CKSUM > backup ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > da5 ONLINE 0 0 57 > da2 ONLINE 0 0 0 > da8 ONLINE 0 0 25 > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 49 > da12 ONLINE 0 0 8 > da6 ONLINE 0 0 6 > da11 ONLINE 0 0 0 > da9 ONLINE 0 0 56 > da13 ONLINE 0 0 73 > > errors: No known data errors I'm finding it hard to believe that 7 disks out of 12 are failing or just happened to misbehave all on the same day. BTW, SMART says they are OK. I'm reluctant to blame RAM (since it's ECC) and power supply (as it's redundant 2x800W). Disks are 16TB TOSHIBA MG09ACA1 connected to a MegaRAID SAS-3 3108 (of course not operating as RAID and with mrsas driver). % freebsd-version 14.2-RELEASE-p2 % zfs --version zfs-2.2.6-FreeBSD_g33174af15 zfs-kmod-2.2.6-FreeBSD_g33174af15 Is there a known ZFS bug that could explain this? I've "zpool clear"ed the errors and waiting to see if they come up again. bye & Thanks av. P.S. Also, I'm quite sure no "administrator accidentally wrote over a portion of the disk using another program" :)